Staff Generative AI Research Engineer, Multimodal, Agent Modeling - SIML

Confirmed live in the last 24 hours

Apple

Shanghai

On-site

Posted February 26, 2026

Job Description

Summary

Are you passionate about Generative AI? Are you interested in working on cutting edge generative modeling technologies to enrich billions of people? We are driving multiple initiatives focused on advancing generative models, and we are seeking technical leaders experienced in training, adapting and deploying large-scale generative models. This role emphasizes multimodal understanding and generation, and the development of agentic systems that push the boundaries of what AI can achieve responsibly. We are the Intelligence System Experience (ISE) team within Apple’s software organization. The team operates at the intersection of multimodal machine learning and system experiences. It oversees a range of experiences such as System Experience (Springboard, Settings), Image Generation, Genmoji, Writing tools, Keyboards, Pencil & Paper, Generative Shortcuts - all powered by production scale ML workflows. Our multidisciplinary ML teams focus on a broad spectrum of areas, including Visual Generation Foundation Models, Multimodal Understanding, Visual Understanding of People, Text, Handwriting, and Scenes, Personalization, Knowledge Extraction, Conversation Analysis, Behavioral Modeling for Proactive Suggestions, and Privacy-Preserving Learning. These innovations form the foundation of the seamless, intelligent experiences our users enjoy every day. We are seeking Staff Research Engineers to lead the architecture and innovation of multimodal and agentic reasoning across multimodality. The ideal candidate will drive cross-functional efforts spanning ML modeling, prototyping, validation, and privacy-preserving learning. Strong expertise in machine learning and generative AI, with a proven ability to translate research into production-grade systems, is essential. Experience in agentic reasoning, reinforcement and preference learning, and vision-language multimodal understanding and reasoning is highly desirable. Selected references to our team’s work: https://arxiv.org/pdf/2507.13575 https://arxiv.org/pdf/2407.21075 https://www.apple.com/newsroom/2024/12/apple-intelligence-now-features-image-playground-genmoji-and-more/

Description

We are looking for a candidate with a proven track record in applied ML research. Responsibilities in the role will include training large scale-multimodal (2D/3D vision-language) models on distributed backends, deploying efficient neural architectures on device and private cloud compute, addressing emerging safety challenges to make the model/agents robust and aligned with human values. A key focus of the position is ensuring real-world quality, emphasizing model and agent safety, fairness, and robustness. You will collaborate closely with ML researchers, software engineers, and hardware and design teams across multiple disciplines. The core responsibilities include advancing the multimodal capabilities of large language models and strengthening AI safety and security for agentic workflows. On the user experience front, the work will involve aligning image and video content to the space of LLMs for visual actions and multi-turn interactions, enabling rich, intuitive experiences powered by agentic AI systems.

Minimum Qualifications

PhD, or MSc in Computer Science/Electrical Engineering, or a related field (mathematics, physics or computer engineering); with a focus on computer vision and/or machine learning, or comparable professional experience Strong ML and Generative Modeling fundamentals Experience using one or more of the following: Pre-training or Post-training of Multimodal-LLMs, Reinforcement Learning Familiarity with distributed training Proficiency in using ML toolkits, e.g., PyTorch Proven track record of research contributions demonstrated through publications in top-tier conferences, and demonstrated leadership in both applied research and development

Preferred Qualifications

Experience with building & deploying AI agents, LLMs for tool use, and Multimodal-LLMs Aware of the challenges associated to the transition of a prototype into a final product