About the role
Summary
Join Apple Services Engineering to build the next generation of AI evaluation systems. We are seeking a staff machine learning platform engineer to lead the architectural design and development of the high availability services and internal tools powering self-service evaluation at scale. You will partner with researchers to operationalize their innovations, transforming complex workflows into intuitive, developer-first platforms. We are looking for builders who thrive in the ambiguity of new initiatives and are passionate about creating scalable infrastructure.
Description
You will join the engineering team responsible for democratizing AI evaluation across the organization. Your focus will be on developing the developer experience—architecting and implementing the APIs, SDKs, and platform services that turn complex evaluation metrics into simple, self-service calls. You will work hand-in-hand with researchers to operationalize sophisticated measurement techniques, ensuring they scale reliably within our high-availability infrastructure. In this role, you will drive the engineering standards for a new organization, upholding the code quality, automation, and testing rigor required to support the rapid evolution of Generative AI and Agentic systems.
Minimum Qualifications
8+ years of hands-on software engineering experience, with a track record of owning the technical direction of a platform or infrastructure domain. Strong proficiency in the Python ecosystem (e.g., FastAPI, Pydantic, Pandas). You write production-grade code and lead architectural discussions on day one. Customer Obsession & Product Thinking: You have owned the technical roadmap for an internal platform, presented it to senior stakeholders, and shipped against it. You independently translate vague requirements from other teams into concrete engineering specifications and platform roadmaps. Demonstrated experience leading technical partnerships with Data Scientists or Researchers: You have taken research code and shipped it as a production service and built the abstractions, testing frameworks, and deployment pipelines that made the next handoff faster than the last.. Strong expertise in API Design & Platform Infrastructure: You have designed and owned APIs and SDKs that other developers rely on, with a focus on versioning, backward compatibility, and developer experience at scale. Operational excellence background: You have architected and owned CI/CD pipelines, containerization (Docker/Kubernetes), and monitoring (Datadog/Prometheus) for production services, and have been accountable for their reliability. Bachelors in Computer Science or related field, Masters preferred.
Preferred Qualifications
Deep familiarity with AI Evaluation Frameworks: You have built, extended, or contributed to modern evaluation tools like DeepEval, Ragas, TruLens, or LangSmith. You understand how to implement and scale model-based evaluation workflows across a large organization. Evaluation Service Deployment: Own the deployment, scaling, and operational health of evaluation services in production - including high-throughput evaluation job orchestration (queueing, prioritization, concurrency, auto-scaling), and defining SLAs for evaluation pipeline latency and availability. Observability & Reliability: Experience instrumenting production ML evaluation pipelines including tracking evaluation job throughput, queue depth, judge model latency SLAs, scoring drift over time, and failure modes specific to non-deterministic LLM-based evaluation workflows. Deep understanding of Generative AI & Agents: You understand the engineering challenges of relying on LLMs and Agents as software components—specifically managing token economics, handling rate limits, and evaluating non-deterministic, multi-step reasoning capabilities. You have built production systems that depend on these components and have solved these problems at scale. Builder Experience: You have thrived in startup-like environments, navigating high ambiguity to deliver complex technical roadmaps from scratch.
Skills & Tags
Aplyr's read
Apple is a tech giant known for its sleek design and innovation, attracting top talent in engineering, design, and business operations.
What's promising
- •Apple consistently leads in tech innovation with a strong focus on design and user experience.
- •The company's global brand recognition offers employees a prestigious platform for career growth.
- •Apple's robust ecosystem integrates hardware, software, and services, creating diverse job opportunities.
What to watch
- •High-pressure work environment with demanding deadlines can impact work-life balance.
- •Apple's secretive culture may limit transparency and cross-departmental communication.
- •Dependence on hardware sales makes the company vulnerable to market saturation risks.
Why Apple
- •Apple's design philosophy emphasizes simplicity and elegance, setting it apart in the tech industry.
- •The company has a unique retail presence with its own stores enhancing customer experience.
- •Apple's closed ecosystem creates a seamless integration across its products, unmatched by competitors.
Aplyr’s read is generated by AI from public sources. Was it useful?
About Apple
Apple Inc. is a leading technology company known for its innovative consumer electronics, software, and services. The company designs and manufactures products such as the iPhone, iPad, Mac computers, and wearables, significantly influencing the tech industry and consumer behavior worldwide.
Similar roles
Machine Learning Engineer III / Senior Machine Learning Engineer - AI Platform
Workday
Senior Machine Learning Engineer, AI Platform
Smartly
Senior Machine Learning Engineer, AI Platform
Smartly
Staff Machine Learning Engineer (AI Platform)
Smartly
Senior Machine Learning Engineer - GenAI Platform
Databricks
Staff Product Manager, AI Platform
Databricks