About the role

Summary

Do you have a passion for computer vision and solving deep learning problems? The Video Engineering Data Analytics and Quality group is seeking an expert in evaluating machine learning and deep learning models, including foundation models and multimodal systems. This role will play a critical part in crafting robust evaluation frameworks, using both traditional statistical methods and modern techniques like LLM-as-a-Judge! The ideal candidate combines strong analytical thinking, expertise in Python, and advanced knowledge of statistical methodologies and data quality standards. This role involves collaboration with teams at Apple passionate about developing foundation models, including ML engineers, data scientists, and ML Infrastructure engineers to deliver amazing user experiences!

Description

Develop robust methodologies to assess the performance of foundation models (e.g., LLMs, vision-language models, etc.) across diverse tasks. Leverage LLMs as judges to perform subjective and open-ended model evaluations (e.g., for summarization, reasoning, or multimodal generation tasks). Build, curate, and lead evaluation datasets and benchmarks. Advanced proficiency in at least one scripting language, preferably Python. Collaborate with research, engineering, and product teams to define evaluation goals aligned with user experience and product quality. Conduct failure analysis and uncover edge cases to improve model robustness. Contribute to our tools and infrastructure to automate and scale evaluation processes.

Minimum Qualifications

BS and a minimum of 3 years relevant industry experience Strong experience in evaluating supervised, unsupervised, and deep learning models. Hands-on experience evaluating LLMs and using them as scoring/judging mechanisms. Familiarity with multimodal models (e.g., image + text, video + audio) and related evaluation challenges. Proficiency in Python and libraries such as NumPy, pandas, scikit-learn, PyTorch, or TensorFlow. Solid understanding of statistical testing, sampling, confidence intervals, and metrics (e.g., precision/recall, BLEU, ROUGE, FID, etc.). Strong documentation skills, including the ability to write technical reports and present to non-technical audiences.

Preferred Qualifications

Experience working with open-source evaluation tools like OpenEval, ELO-based ranking, or LLM-as-a-Judge frameworks. Familiarity with prompt engineering, few-shot or zero-shot evaluation techniques. Experience evaluating generative models (e.g., text generation, image generation). Prior contributions to ML benchmarks or public evaluations. Strong interpersonal skills.

Skills & Tags

machine learning data

Aplyr's read

Apple is a tech giant known for its sleek design and innovation, attracting top talent in engineering, design, and business operations.
Synthesized from recent postings & public sources

What's promising

•Apple consistently leads in tech innovation with a strong focus on design and user experience.
•The company's global brand recognition offers employees a prestigious platform for career growth.
•Apple's robust ecosystem integrates hardware, software, and services, creating diverse job opportunities.

What to watch

•High-pressure work environment with demanding deadlines can impact work-life balance.
•Apple's secretive culture may limit transparency and cross-departmental communication.
•Dependence on hardware sales makes the company vulnerable to market saturation risks.

Why Apple

•Apple's design philosophy emphasizes simplicity and elegance, setting it apart in the tech industry.
•The company has a unique retail presence with its own stores enhancing customer experience.
•Apple's closed ecosystem creates a seamless integration across its products, unmatched by competitors.

Aplyr’s read is generated by AI from public sources. Was it useful?

About Apple

Apple

apple.com

View company

AAPL$296.42+1.82%

Apple Inc. is a leading technology company known for its innovative consumer electronics, software, and services. The company designs and manufactures products such as the iPhone, iPad, Mac computers, and wearables, significantly influencing the tech industry and consumer behavior worldwide.