Machine Learning Engineer , AI
Confirmed live in the last 24 hours
CZ Biohub
Job Description
Biohub is the first large-scale initiative bringing frontier AI models, massive compute, and frontier experimental capabilities under one roof. We're building a general-purpose system to accelerate scientific discovery, integrating frontier AI models, biological foundation models, and lab capabilities, with the ultimate goal of curing disease. Our technology powers scientists around the world, translating AI capabilities into tools that accelerate research everywhere.
Biohub operates one of the largest AI compute clusters dedicated to biology, spanning three frontier research institutes with some of the world's leading biologists. We're not a startup trying to find product-market fit, and we're not a pharma company optimizing a pipeline. We're building frontier AI for fundamental science, as open science, at a scale no one else is doing. This is a unique moment for scientific acceleration. The problems are among the hardest and most impactful problems you can choose to work on, and we move at a pace that meets this moment.
Our research spans:
- Frontier molecular modeling, from protein language models (e.g., ESM) to structure prediction (e.g., ESMFold) and beyond.
- Scaled biological foundation models trained on some of the largest GPU clusters dedicated to science
- Imaging foundation models trained across the world's largest microscopy datasets
- Reasoning and agentic systems that connect frontier LLMs with biological foundation models
- Mechanistic interpretability of biological foundation models: extracting new biological knowledge directly from model weights
- Scientific data at unprecedented scale: AI systems to collect, curate, and learn from some of the richest biological datasets ever assembled
Join our Team!
As an ML Engineer, you'll join some of the strongest infrastructure engineers in AI, building the systems that connect everything together. The infrastructure problems you solve directly determine what science becomes possible.
What You'll Do
- Work with high-dimensional scientific data formats and contribute to backend compatibility, format evaluation, and I/O performance benchmarking at petabyte scale.
- Define and shape the engineering patterns your team and collaborating researchers will build on for years; the abstractions you write today become the foundation others depend on at scale.
- Work at the intersection of AI systems and biological discovery, where the infrastructure problems you solve directly determine what science becomes possible.
- Deploy models to production and manage artifact tracking across models and datasets.
- Design and optimize GPU-native data loading pipelines for large-scale multi-dimensional tensor workloads, including profiling and resolving hardware utilization bottlenecks across multi-backend systems.
- Work on simplification and improvement of codebase abstractions to accelerate research momentum.
- Build and maintain primitives for pre-training infrastructure that ensure the reliability and continuity of large-scale training runs.
- Help cultivate best practices in MLOps, and think about the full ML lifecycle, including data, fine-tuning, deployment, reliability and monitoring.
- Possesses the ability to execute complex modifications to the research pipeline, such as fast data loading and distributed training.
- Handle DevOps responsibilities, focused on making all engineers and researchers more productive. This includes tasks like cluster monitoring, unit testing and integration testing of research codebase, and continuous integration.
- Collaborate with partner researchers and engineers to deploy our technology within external infrastructure.
What You'll Bring
- Hands-on experience with PyTorch, including custom training loops, distributed training, or low-level performance work.
- Familiarity with GPU-native data I/O tools and large-scale tensor formats (e.g. Zarr, HDF5, TensorStore, or similar).
- Experience with distributed computing frameworks such as Apac
Similar Jobs
Roku
Senior Machine Learning Engineer
CZ Biohub
Machine Learning Engineer , AI
Anduril Industries
Senior Machine Learning/MLOps Engineer
Twilio
Machine Learning Engineer
SumUp
Senior AI/ML Engineer
Lighthouse