Title: Senior Software Engineer (TPU Performance) at Google Cloud

Minimum Qualifications:

  • Bachelor’s degree or equivalent practical experience.
  • 8 years of hands-on experience in software development, including proficiency in data structures and algorithms.
  • 5 years of proven experience in testing and launching software products, complemented by 3 years of expertise in software design and architecture.
  • 5 years of in-depth experience with machine learning algorithms and tools such as TensorFlow, with a background in artificial intelligence, deep learning, or natural language processing.

Preferred Qualifications:

  • Track record of leadership in technical roles, including leading project teams and shaping technical strategy.
  • Experience navigating complex, matrixed organizations, overseeing cross-functional and cross-business projects.
  • Proficiency in performance analysis and optimization, encompassing system architecture and performance modeling.
  • Familiarity with compiler optimizations or related fields, enhancing the efficiency of software systems.
  • Exposure to distributed development and large-scale data processing.

About the Role:
Google’s software engineers drive innovation in next-generation technologies, revolutionizing how billions of users engage with information and each other. Beyond web search, our products demand scalability and versatility across diverse domains including information retrieval, distributed computing, system design, and more. As a Senior Software Engineer, you’ll spearhead critical projects pivotal to Google’s objectives, with opportunities to pivot across teams and initiatives in our dynamic environment.

The TPU Performance team pioneers bleeding-edge efficiency for machine learning/AI training workloads, leveraging deep fleet-scale insights, benchmark analysis, and auto-optimizations. Focusing on performance analysis and optimization, we propel Google ML capabilities to unprecedented heights, showcasing cutting-edge efficiency on the largest scale and latest accelerators.

Responsibilities:

  • Drive optimization efforts for large language models (LLMs) like Google Deepmind Gemini, Bard, and others, focusing on performance analysis and enhancements.
  • Curate and maintain LLM training and serving benchmarks aligned with Google production standards, industry benchmarks, and ML community benchmarks.
  • Collaborate with Google product teams to address LLM performance challenges, including onboarding new models and optimizing for Google’s latest TPU hardware.
  • Explore novel techniques to improve model and data efficiency, aiming to optimize ML tasks and reduce data requirements.