About the role

Aplyr's Quick Take

This role is for a hands-on ML Research Engineer focused on hardware codesign, bridging the gap between AI model research and silicon architecture. You'll be debugging performance issues, writing quantization kernels, and collaborating with both research and hardware teams to optimize AI workloads.

Good fit

Ideal candidates have a strong background in machine learning, hardware design, and system architecture, with several years of relevant experience. A proactive and communicative working style will help you thrive in this collaborative environment.

Worth noting

The position requires a hybrid work model with three days onsite in San Francisco, and relocation assistance is available. The role emphasizes tackling complex problems and driving projects to production, which may appeal to those who enjoy hands-on challenges.

About the Team

OpenAI’s Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI silicon while working closely with software and research partners to co-design hardware tightly integrated with AI models. In addition to delivering production-grade silicon for OpenAI’s supercomputing infrastructure, the team also creates custom design tools and methodologies that accelerate innovation and enable hardware optimized specifically for AI.

About the Role

We’re seeking a Research-Hardware Codesign Engineer to operate at the boundary between model research and silicon/system architecture. You’ll help shape the numerics, architecture, and technology bets of future OpenAI silicon in collaboration with both Research and Hardware.

Your work will include debugging gaps between rooflines and reality, writing quantization kernels, derisking numerics via model evals, quantifying system architecture tradeoffs, and implementing novel numeric RTL. This is a hands-on role for people who go looking for hard problems, get to ground truth, and drive it to production. Strong prioritization and clear, honest communication are essential.

Location: San Francisco, CA (Hybrid: 3 days/week onsite)
Relocation assistance available.

In this role you will:

Build on our roofline simulator to track evolving workloads, and deliver analyses that quantify the impact of system architecture decisions and support technology pathfinding.
Debug gaps between performance simulation and real measurements; clearly communicate root cause, bottlenecks, and invalid assumptions.
Write emulation kernels for low-precision numerics and lossy compression schemes, and get Research the information they need to trade efficiency with model quality.
Prototype numerics modules by pushing RTL through synthesis; hand off novel numerics cleanly, or occasionally own an RTL module end-to-end.
Proactively pull in new ML workloads, prototype them with rooflines and/or functional simulation, and drive initial evaluation of new opportunities or risks.
Understand the whole picture from ML science to hardware optimization, and slice this end-to-end objective into near-term deliverables.
Build ad-hoc collaborations across teams with very different goals and areas of expertise, and keep progress unblocked.
Communicate design tradeoffs clearly with explicit assumptions and confidence levels; produce a trail of evidence that enables confident execution.

You Will Thrive in this Role if:

An exceptional track record of high-quality technical output, and a bias for shipping a prototype now and iterating later in the absence of clear requirements.
Strong Python, and C++ or Rust, with a cautious attitude toward correctness and an intuition for clean extensibility.
Experience writing Triton, CUDA, or similar, and an understanding of the resulting mapping of tensor ops to functional units.
Working knowledge of PyTorch or JAX; experience in large ML codebases is a plus.
Practical understanding of floating point numerics, the ML tradeoffs of reduced precision, and the current state of the art in model quantization.
Deep understanding of transformer models, and strong intuition for transformer rooflines and the tradeoffs of sharded training and inference in large-scale ML systems.
Experience writing RTL (especially for floating point logic) and understanding of PPA tradeoffs is a plus.
Strong cross-functional communication (e.g. across ML researchers and hardware engineers); ability to slice ambiguous early-incubation ideas into concrete arenas in which progress can be made.

To comply with U.S. export control laws and regulations, candidates for this role may need to meet certain legal status requirements as provided in those laws and regulations.

About OpenAI

OpenAI is an AI research and deployment com

Skills & Tags

python go rust aws ai data product design

Aplyr's read

OpenAI is a pioneering AI research organization focused on developing AGI to benefit humanity, attracting talent keen on ethical and innovative AI solutions.
Synthesized from recent postings & public sources

What's promising

•OpenAI is at the forefront of AI research, driving innovation in artificial general intelligence.
•The company emphasizes ethical AI development, appealing to those passionate about responsible technology.
•OpenAI offers diverse roles across engineering, sales, and creative fields, indicating a dynamic work environment.

What to watch

•OpenAI's rapid growth may lead to organizational challenges and resource strain.
•The focus on AGI poses inherent risks and ethical dilemmas that require careful navigation.
•Limited public information about internal culture and work-life balance may concern potential applicants.

Why OpenAI

•OpenAI's mission to ensure AGI benefits all of humanity sets it apart in the tech industry.
•The organization combines cutting-edge research with a strong emphasis on ethical AI practices.
•OpenAI's diverse hiring across global markets reflects its commitment to international impact and collaboration.

Aplyr’s read is generated by AI from public sources. Was it useful?

About OpenAI

OpenAI

openai.com

View company

OpenAI is an artificial intelligence research organization that aims to ensure that artificial general intelligence (AGI) benefits all of humanity. They develop advanced AI technologies and promote safe and responsible AI practices.