Machine Learning Safety: Evaluation Research Engineer

External

Apple · San Francisco, CA

Full-timeOn-site2w ago

ClassificationComplianceData AnalysisDocumentation

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

You will play an impactful role: shaping responsible AI and safety policies, evaluating fidelity to product safety requirements, creating risk assessments and taxonomies, curating exemplar safety evaluation datasets, and ensuring that evaluation frameworks are culturally and linguistically grounded. An ideal candidate possesses a strong understanding of issues in responsible AI and A and society, technology evaluation design principles and practices, and brings experience designing evaluations to support policies and/or product requirements, classification systems, and annotation and/or study participant guidelines.

Responsibilities

Scalable Analysis & Reporting Automation: Create automated pipelines for analysis and reporting that reduce manual effort, increase reproducibility, and enable rapid cross-market safety assessments. Build tooling that integrates with existing dashboards and reporting workflows.
Documentation & Communication: Produce clear, detailed documentation artifacts. Present findings and recommendations to cross-functional stakeholders including engineering, product, compliance, and policy teams.
Monitoring & Data Quality: Develop and implement monitoring frameworks to track evaluation progress, annotator performance, inter-rater agreement, and data quality in real time. Flag anomalies and implement corrective actions to maintain data integrity across markets

Requirements

Experience designing evaluation frameworks for multilingual or cross-cultural contexts.
Familiarity with responsible AI, AI safety, or content moderation policy frameworks.
Experience with experimental design methodologies, inter-rater reliability data analysis and annotation quality assessment methods.
Prior experience working with localization, internationalization, or language service teams.
Experience with survey design, AI policy development, and/or structured content analysis methodologies.
4+ years of experience in an applied research setting related to evaluation design, AI ethics, Responsible AI, AI safety, computational social science, content analysis, or a closely related field.
Strong understanding of taxonomy design, classification systems, and annotation meth

Additional Information

This role supports the design and development of safety evaluation methodologies for generative and agentic AI features that enable users across the globe to interact with our media products and services.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Apple? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect