Sr Machine Learning Engineer, Tech Lead - Autograder Systems, Evaluation

External

Apple · Cupertino, CA

Full-timeOn-site3w ago

Generative AILeadershipMachine LearningPythonPyTorch

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

In this role you will focus on: Technical Leadership Define and drive the technical roadmap for autograder quality - researching and introducing novel methods such as reward modeling, LLM-as-judge, preference learning, and calibration techniques to measurably improve evaluation accuracy. Architect and lead the build-out of a scalable autograder training pipeline encompassing data curation, model fine-tuning, evaluation harnesses, and versioning. Design and own the hillclimbing system that iteratively improves autograder performance through systematic prompt and model optimization loops. Establish quality benchmarks, confidence metrics, and failure analysis frameworks that enable the team to track, trust, and act on autograder outputs. People & Collaboration Mentor and technically guide a team of MLEs through design reviews, modeling standards, and hands-on problem-solving - fostering a culture of rigor and continuous learning. Partner with data annotation teams to define labeling guidelines that feed autograder training. Collaborate with feature engineers to align autograder signals with broader training and product objectives. Translate complex technical trade-offs into clear narratives for engineering, product, and leadership audiences.

Requirements

Strong ML systems instincts - you care deeply about data quality, reproducibility, latency, and scale.
Background in human-in-the-loop annotation pipelines and inter-annotator agreement analysis.
Prior experience on an evaluation infrastructure or model quality team.
Master's or PhD in Computer Science, Machine Learning, Artificial Intelligence, or a related field.
5+ years of industry experience in machine learning, with a strong focus on LLM or VLM systems.
Deep expertise in prompt-tuning and fine-tuning techniques (SFT, RLHF, DPO, or equivalent), with proven experience of model calibration and uncertainty estimation.
Familiarity with data flywheel design - leveraging model outputs to continuously improve future training data.
Proficiency in Python and ML frameworks (PyTorch preferred).
Pay & Benefits
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

Additional Information

We are looking for a Senior MLE Tech Lead to join a centralized evaluation organization and define the next generation of autograder quality across 20+ of Apple's most visible generative AI features. You will own the end-to-end technical vision for how we evaluate model outputs at scale - pioneering state-of-the-art methods, raising the technical bar, and leading a team of talented MLEs to build a robust autograder training and hillclimbing system from the ground up. This is a high-impact, hands-on leadership role at the intersection of model evaluation, data quality, and ML systems engineering. You will work closely with model developers, data teams, and product partners to ensure our autograders are fast, accurate, and continuously improving - directly shaping the quality of AI experiences used by hundreds of millions of people.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Apple? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect