Principal Machine Learning Engineer, Mobile AI Inference Optimization
Confirmed live in the last 24 hours
Unity Technologies
Compensation
$278,100 - $347,600/year
Job Description
The opportunity
We are building the next generation of mobile game AI experiences, deploying world models to mobile on-device. As our Principal Machine Learning Engineer, you will be the foremost technical authority on bringing state-of-the-art multi-modal models (transformers, diffusion networks, and JAPE-style architectures) from research to production on mobile hardware.
This is a deeply hands-on, high-impact role. You will define the inference strategy, drive architectural decisions across the full mobile ML stack, and mentor a team of senior and mid-level engineers. Your work will directly determine the latency, quality, and power profile of AI-driven features experienced by billions of mobile game players.
What you'll be doing
- Technical Leadership:
- Set the technical vision and roadmap for deploying multi-modal AI models to iOS and Android, spanning transformers, diffusion models, and JAPE-style generative architectures.
- Make authoritative decisions on model compression, quantization, pruning, and knowledge distillation strategies to meet mobile latency and memory budgets.
- Evaluate and select inference runtimes (e.g., CoreML, ONNX Runtime Mobile, TFLite, ExecuTorch) and drive adoption across the team.
- Own the end-to-end optimization pipeline: from model export and graph transformation to hardware-specific kernel tuning on NPU, GPU, and CPU.
- Architecture & Research Translation:
- Collaborate directly with research scientists to translate novel model architectures into deployable, mobile-optimized implementations.
- Design scalable systems for multi-modal inference that process diverse inputs — images, text, primitives, and metadata — and produce pixel-level outputs with real-time performance.
- Pioneer new approaches to dynamic resolution, token reduction, and speculative decoding tailored to mobile constraints.
- Track and rapidly adopt breakthroughs in efficient diffusion (e.g., consistency models, flow matching) and efficient attention (e.g., FlashAttention, linear attention variants).
- Team & Cross-Functional Leadership:
- Lead and mentor a team of ML engineers; define engineering best practices, code review standards, and on-device benchmarking methodology.
- Partner with platform engineers, product managers, and runtime teams to align ML capabilities with device SKU constraints and product roadmaps.
- Champion a culture of measurement: define KPIs for latency, accuracy, memory, and power consumption and ensure the team tracks them rigorously.
What we're looking for
- 8+ years in ML engineering, with at least 3 years focused on on-device / edge inference optimization.
- Proven production deployment of transformer-based models (e.g., ViT, LLaMA, Stable Diffusion) and/or JAPE-style generative architectures on mobile or embedded hardware.
- Hands-on expertise with CoreML, TFLite, ONNX Runtime, and/or ExecuTorch; deep understanding of operator fusion, memory layout, and runtime scheduling.
- Expert-level command of INT8/INT4/FP16 quantization, weight sharing, structured/unstructured pruning, and knowledge distillation.
- Strong understanding of mobile SoC architectures (Apple Neural Engine, Qualcomm Hexagon/Adreno, ARM Mali) and how to target each for peak throughput.
- Proficiency in C++ / Objective-C / Swift for runtime integration; solid Python for training-side tooling and export pipelines.
- Ability to read, implement, and extend ML research papers; familiarity with efficient attention, diffusion samplers, and multi-modal fusion techniques.
- Track record of technical leadership: setting direction, influencing cross-functional partners, and growing engineers.
You might also have
- Experience shipping world-model or neural rendering pipelines (NeRF, 3DGS, or similar) on mobile.
- Contributions to open-source ML inference frameworks or mobile ML research publications.
- Familiarity with compiler stacks such as MLIR, TVM, or XLA for custom kernel generation.
- Background in real-time graphics or game engine pipelines (Metal, Vulkan, OpenGL ES).
Additional information
- International relocation support is not available for this pos
Similar Jobs
Unity Technologies
Staff Machine Learning Engineer
FanDuel
Staff Machine Learning Engineer - Search
FanDuel
Staff Machine Learning Engineer - Search
ChargePoint
Senior AI Engineer
ChargePoint
Staff AI Engineer
ChargePoint