Back to Search






Mid-Level
Machine Learning Engineer II, Amazon Music - AI and Personalization
Confirmed live in the last 24 hours
Amazon.com Services LLC
Seattle, WA, USA
On-site
Posted March 25, 2026
Job Description
Amazon Music is an immersive audio entertainment service that deepens connections between fans, artists, and creators. From personalized music playlists to exclusive podcasts, concert livestreams to artist merch, Amazon Music is innovating at some of the most exciting intersections of music and culture. We offer experiences that serve all listeners with our different tiers of service: Prime members get access to all the music in shuffle mode, and top ad-free podcasts, included with their membership; customers can upgrade to Amazon Music Unlimited for unlimited, on-demand access to 100 million songs, including millions in HD, Ultra HD, and spatial audio; and anyone can listen for free by downloading the Amazon Music app or via Alexa-enabled devices. Join us for the opportunity to influence how Amazon Music engages fans, artists, and creators on a global scale. Learn more at https://www.amazon.com/music
We are seeking a Machine Learning Engineer to join the Amazon Music AI and Personalization team and drive model training efficiency and inference optimization improvements. In this role, you will work at the intersection of machine learning and systems engineering, ensuring our models train faster, cost less, and run efficiently in production environments. You will collaborate closely with research scientists, platform engineers, and product teams to deliver scalable, high-performance ML solutions that help customers discover great new products and save money on products that they are evaluating.
Key job responsibilities
Model Training Optimization
- Design and implement strategies to improve training throughput and reduce time-to-convergence
- Profile and eliminate bottlenecks in data loading, preprocessing, and model computation
- Develop and maintain training infrastructure that scales efficiently with model and dataset size
Inference Optimization
- Optimize models for low-latency, high-throughput production inference
- Implement and benchmark inference optimizations across various hardware targets (GPU, CPU, edge devices)
- Establish performance benchmarks and monitoring for inference pipelines
Service Ownership & Operations
- Own production services that support ML decision models, including ranking services, orchestration layers, and model-serving infrastructure
- Participate in on-call rotation to ensure service reliability, respond to operational issues, and drive continuous improvement
- Design and implement monitoring, alerting, and observability solutions for ML services to proactively identify and resolve issues
- Manage service dependencies, API contracts, and integration points between ML models and downstream systems
- Drive operational excellence through automation, runbook development, and post-incident reviews
Cross-Functional Collaboration
- Partner with research teams to understand model architectures and identify optimization opportunities
- Collaborate with Science/ML teams on service integration points and ownership boundaries for ML components
- Contribute to best practices and tooling for ML efficiency across the organization
- Evaluate emerging hardware and software technologies for potential adoption
A day in the life
An MLE's day typically begins with checking model performance metrics and reviewing overnight training runs. Mornings often involve team standups and planning sessions. The core work includes cleaning and preprocessing data, developing and fine-tuning models, writing Python code (both by yourself and via GenAI coding tools), and debugging pipelines. Afternoons might feature collaboration with data scientists and software engineers, code reviews, and deploying models to production. Service ownership responsibilities include monitoring production systems, responding to alerts, participating in on-call rotations, and ensuring model reliability and performance in live environments. Time is spent reading research papers and attending annual conferences to stay current on state of the art model training and online inference optimization techniques.
- 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Experience in machine learning, data mining, information retrieval, statistics or natural language processing
- Experience programming with at least one modern language such as Java, C++, or C# including object-oriented design
- Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution
- Experience building complex software systems that have been successfully delivered to customers
- Experience with Machine and
We are seeking a Machine Learning Engineer to join the Amazon Music AI and Personalization team and drive model training efficiency and inference optimization improvements. In this role, you will work at the intersection of machine learning and systems engineering, ensuring our models train faster, cost less, and run efficiently in production environments. You will collaborate closely with research scientists, platform engineers, and product teams to deliver scalable, high-performance ML solutions that help customers discover great new products and save money on products that they are evaluating.
Key job responsibilities
Model Training Optimization
- Design and implement strategies to improve training throughput and reduce time-to-convergence
- Profile and eliminate bottlenecks in data loading, preprocessing, and model computation
- Develop and maintain training infrastructure that scales efficiently with model and dataset size
Inference Optimization
- Optimize models for low-latency, high-throughput production inference
- Implement and benchmark inference optimizations across various hardware targets (GPU, CPU, edge devices)
- Establish performance benchmarks and monitoring for inference pipelines
Service Ownership & Operations
- Own production services that support ML decision models, including ranking services, orchestration layers, and model-serving infrastructure
- Participate in on-call rotation to ensure service reliability, respond to operational issues, and drive continuous improvement
- Design and implement monitoring, alerting, and observability solutions for ML services to proactively identify and resolve issues
- Manage service dependencies, API contracts, and integration points between ML models and downstream systems
- Drive operational excellence through automation, runbook development, and post-incident reviews
Cross-Functional Collaboration
- Partner with research teams to understand model architectures and identify optimization opportunities
- Collaborate with Science/ML teams on service integration points and ownership boundaries for ML components
- Contribute to best practices and tooling for ML efficiency across the organization
- Evaluate emerging hardware and software technologies for potential adoption
A day in the life
An MLE's day typically begins with checking model performance metrics and reviewing overnight training runs. Mornings often involve team standups and planning sessions. The core work includes cleaning and preprocessing data, developing and fine-tuning models, writing Python code (both by yourself and via GenAI coding tools), and debugging pipelines. Afternoons might feature collaboration with data scientists and software engineers, code reviews, and deploying models to production. Service ownership responsibilities include monitoring production systems, responding to alerts, participating in on-call rotations, and ensuring model reliability and performance in live environments. Time is spent reading research papers and attending annual conferences to stay current on state of the art model training and online inference optimization techniques.
Basic Qualifications
- Bachelor's degree in computer science or equivalent- 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Experience in machine learning, data mining, information retrieval, statistics or natural language processing
- Experience programming with at least one modern language such as Java, C++, or C# including object-oriented design
- Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution
- Experience building complex software systems that have been successfully delivered to customers
- Experience with Machine and
pythonjavamachine learningaidataproductdesign
Similar Jobs
Roku
Senior Machine Learning Engineer, Platform
SeniorSan Jose, California$229,500 - $367,100/year
Roblox
[2026] Senior Machine Learning Engineer, Natural Language Processing - PhD Early Career
SeniorSan Mateo, CA, Unite...$195,780 - $242,100/year
Stripe
Machine Learning Engineer, Support Experience
Mid-LevelToronto, Canada
Zscaler
Sr. Staff Machine Learning Engineer
SeniorSan Jose, California...$154,000 - $220,000/year
Glean
Machine Learning Engineer, Enterprise Brain
Mid-LevelSan Francisco Bay Ar...$200,000 - $300,000/year
Engine
Sr. AI Automations Engineer (LATAM)
SeniorRemote - Latin Ameri...