Back to Search
Overview
Internship

Research Scientist Intern

Confirmed live in the last 24 hours

XPENG Motors

XPENG Motors

Santa Clara, CA
On-site
Posted March 18, 2026

Job Description

XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity.
 

About the Role

We are actively seeking a full-time Research Scientist Intern to drive the modeling and algorithmic development of XPENG’s next-generation Vision-Language-Action (VLA) Foundation Model — the core brain that powers our end-to-end autonomous driving systems.
 
You will work closely with world-class researchers, perception and planning engineers, and infrastructure experts to design, train, and deploy large-scale multi-modal models that unify vision, language, and control. Your work will directly shape the intelligence that enables XPENG’s future L3/L4 autonomous driving products.

Key Responsibilities

  • Conduct research on designing and implementing large-scale multi-modal architectures (e.g., vision–language–action transformers) for end-to-end autonomous driving.
  • Design and integrate cross-modal alignment (e.g., visual grounding, temporal reasoning, policy distillation, imitation and reinforcement learning) to improve model interpretability and action quality.
  • Closely collaborate with researchers and engineers across the modeling and infrastructure team.
  • Contribute to top-tier AI/CV/ML conferences publications and present research findings.

Minimum Qualifications

  • Currently enrolled in the Master/Ph.D program in Computer Science, Electrical/Computer Engineering, or related field, with the specialization in the CV/NLP/ML.
  • Experience in multi-modal modeling (vision, language, or planning), with deep understanding of representation learning, temporal modeling, and reinforcement learning techniques.
  • Strong prof
gomachine learningaidataproductdesign