About the role
At NVIDIA, we’re not just building the future, we’re generating it! Our world model team is pushing the boundaries of multimodal AI, robotics, and world foundation models for Physical AI. We are looking for a Senior Research Manager to lead world-model evaluation and benchmarking across NVIDIA’s Physical AI model portfolio. This role will build the team and research agenda for evaluating world models through closed-system evaluations, where the model under test is pluggable, and open-system evaluations, where access to model internals enables deeper diagnostics, causal analysis, and mechanistic evaluation.
This is not only about leaderboards. It is about defining what makes a world model useful for Physical AI, discovering model failures, and turning those findings into better data, training recipes, model roadmaps, and downstream systems. The team will build a closed improvement loop across model evaluation, failure discovery, data generation, post-training, and re-evaluation.
What you’ll be doing:
Lead a team of Research Scientists focused on world-model evaluation, benchmarking, and diagnostics for NVIDIA Physical AI models, including world foundation models, world-action models, synthetic data generation systems, robotics, simulation, and embodied AI workflows.
Define the scientific roadmap for closed-system and open-system evaluation, including open-loop and closed-loop benchmarks, metrics, failure taxonomy, model comparison, and evaluation-to-training feedback loops.
Develop benchmarks for physical plausibility, temporal consistency, scene dynamics, object permanence, spatial reasoning, action conditioning, affordances, controllability, long-horizon coherence, SDG quality, and WAM usefulness.
Develop open-system and mechanistic evaluation methods using model internals, including representation probing, causal interventions, activation analysis, ablations, sparse autoencoders, attention and feature analysis, and circuit-style diagnostics.
Drive evaluation-to-model-improvement loops with training, post-training, data curation, simulation, robotics, SDG, WAM, and applied research teams, including failure discovery, data generation, post-training priorities, model roadmap feedback, and re-evaluation.
Publish high-quality papers, technical reports, benchmarks, and open-source evaluation artifacts while establishing rigorous standards for validity, reproducibility, dataset hygiene, leakage prevention, and model comparison.
What we need to see:
Strong research background in machine learning, computer vision, multimodal AI, robotics, world models, representation learning, model evaluation, or mechanistic interpretability.
Experience leading research teams, research programs, or cross-functional technical initiatives with measurable scientific and product impact.
Deep understanding of modern foundation models, including video models, vision-language-action models, diffusion or flow models, self-supervised learning, or world-model architectures.
Experience designing serious benchmarks, evaluation datasets, metrics, diagnostic tools, or model analysis frameworks for complex ML systems.
Familiarity with world-model evaluation and open-system analysis techniques, such as physical plausibility, temporal consistency, action conditioning, counterfactual reasoning, representation probing, activation patching, causal interventions, sparse autoencoders, or feature attribution.
PhD, or equivalent experience in Computer Science, Electrical Engineering, Robotics, Machine Learning, AI, or a related field, with
12+ overall years of relevant research or engineering experience as well as 5+ years of management experience.
Ability to work onsite at NVIDIA’s Santa Clara headquarters; this is not a remote position.
Ways to stand out from the crowd:
Built influential benchmarks, evaluation suites, model diagnostics, or interpretability tools used by research or production teams.
Published in areas such as world models, video generation, physical AI, embodied AI, robotics, representation learning, mechanistic interpretability, self-supervised learning, or model evaluation.
Experience evaluating generative video models, action-conditioned world models, robotics foundation models, world-action models, synthetic data generation systems, simulation systems, or vision-language-action models.
Strong point of view on what current benchmarks miss, and excitement to build the next generation of evaluation science for Physical AI.
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative, passionate and self-motivated, we want to hear from you! NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services.
You will also be eligible for equity and benefits.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.Aplyr's read
NVIDIA is a pioneering force in GPUs and AI, attracting top talent in engineering and innovation-driven roles across various tech domains.
What's promising
- •NVIDIA leads the GPU market, crucial for gaming and AI applications.
- •The company invests heavily in AI and deep learning, driving technological advancements.
- •NVIDIA's strong market position offers stability and growth opportunities for employees.
What to watch
- •High competition in the semiconductor industry can impact market share.
- •Rapid technological changes require constant adaptation and learning.
- •Intense workload and high expectations may affect work-life balance.
Why NVIDIA
- •NVIDIA's GPUs are industry benchmarks in gaming and professional graphics.
- •The company's AI research is at the forefront of deep learning innovation.
- •NVIDIA's culture emphasizes cutting-edge technology and engineering excellence.
Aplyr’s read is generated by AI from public sources. Was it useful?
About NVIDIA
NVIDIA is a leading technology company known for its graphics processing units (GPUs) for gaming and professional markets, as well as its advancements in artificial intelligence and deep learning.
Similar roles
Director-Real World Evidence
Novartis
Research Engineer / Scientist (3D Tech Lead)
World Labs
Head of Research, Assessment and Monitoring (RAM) unit, Level II
World Food Programme
Research Project Manager
WorldQuant
Director, Global Real World Evidence
Johnson & Johnson
Director, Real World Evidence Center of Excellence
Kite Pharma