Back to Search
We are seeking an exceptional Research Intern to join our core team in building the next generation of interactive Video World Models. While traditional generative AI focuses on generating passive pixels (e.g., text-to-video), our mission is fundamentally more ambitious: we are building foundational "World Models" that inherently understand physics, causality, action spaces, and complex dynamics directly from internet-scale data. Our goal is to train models that can simulate and "dream" complex virtual worlds, allowing users and agents to explore and interact with them in real time.
This is not a purely theoretical role.
Training interactive world models at this scale requires pushing the absolute limits of modern GPUs. We operate at the intersection of cutting-edge generative AI research and high-performance machine learning systems. We are looking for "full-stack" hacker-researchers—visionary thinkers who are also elite engineers, capable of co-designing novel neural architectures and engineering the highly optimized infrastructure required to train them across thousands of GPUs.





Internship
Research Intern – Video World Models (Research & ML Systems)
Confirmed live in the last 24 hours
Tencent
Compensation
$80,168.40 - $124,800/year
US-California-Palo Alto
On-site
Posted March 26, 2026
Job Description
Business Unit
What the Role Entails
About the PositionWe are seeking an exceptional Research Intern to join our core team in building the next generation of interactive Video World Models. While traditional generative AI focuses on generating passive pixels (e.g., text-to-video), our mission is fundamentally more ambitious: we are building foundational "World Models" that inherently understand physics, causality, action spaces, and complex dynamics directly from internet-scale data. Our goal is to train models that can simulate and "dream" complex virtual worlds, allowing users and agents to explore and interact with them in real time.
This is not a purely theoretical role.
Training interactive world models at this scale requires pushing the absolute limits of modern GPUs. We operate at the intersection of cutting-edge generative AI research and high-performance machine learning systems. We are looking for "full-stack" hacker-researchers—visionary thinkers who are also elite engineers, capable of co-designing novel neural architectures and engineering the highly optimized infrastructure required to train them across thousands of GPUs.
What You Will Do
- Architect & Scale Foundation Models: Design, train, and scale state-of-the-art interactive world models (combining Diffusion, Autoregressive Transformers, VAEs, LLMs, VLMs) on massive video datasets.
- Push the Boundaries of ML Systems: Architect highly scalable distributed training pipelines, utilizing advanced model and data parallelism to train massive models efficiently on large-scale GPU clusters.
- Optimize for Efficiency: Profile and optimize model architectures to break through memory and compute bottlenecks. Write high-performance, custom hardware kernels to maximize Model FLOPs Utilization (MFU) and enable real-time, low-latency inference.
Who We Look For
Requirements
- Academic Excellence: Currently pursuing a PhD (or Master’s degree with a truly exceptional research/engineering track record) in Computer Science, Machine Learning, Computer Architecture, or a related field.
- Engineering Skills: Exceptional, production-level coding proficiency in Python or other languages. Background in competitive programming is a great plus.
- AI Infrastructure & Scaling: Experience with modern AI infrastructure stack and large-scale machine learning systems, such as PyTorch FSDP, Megatron, etc. Experience with GPU kernels using CUDA and/or Triton is a great plus.
- Deep Generative Expertise: Thorough theoretical and practical understanding of modern generative paradigms (Diffusion, Vision Transformers, Autoregressive sequence modeling, discrete tokenization/VAEs).
- Top-Tier Publication Record: First-author publications in top-tier AI venues (NeurIPS, ICLR, ICML, CVPR, ICCV) OR premier ML Systems venues (MLSys, OSDI, ASPLOS).
Location State(s)
US-California-Palo AltoThe expected base pay range for this position in the location(s) listed above is $80,168.40 to $124,800.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience. This position will be eligible for 1 hour of paid sick leave for every 30 hours worked and up to 13 paid holidays throughout the calendar year. Subject to the terms and conditions of the applicable plans then in effect, full-time interns are also eligible to enroll in the Company-sponsored medical plan.Equal Employment Opportunity at Tencent
As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.
Similar Jobs
The New York Times
Video Research and Clearance Manager
Lead / ManagerNew York, NY$120,000 USD
Air Apps
Video Producer
PrincipalBarcelona
Air Apps
Video Producer
PrincipalStockholm
Air Apps
Video Producer
PrincipalHelsinki
Farm Credit Canada
Video Producer
PrincipalRegina, Saskatchewan$71,145 - $96,255
Amazon.com Services LLC
Sr. Applied Scientist, Prime Video - Title Lifecycle Presentation
SeniorSeattle, WA, USA