AI Engineer, Evaluation
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Design and implement evaluation frameworks that enable Evaluation-Driven Development for AI systems deployed in customer environments
- Define how system quality is measured in each domain, ensuring that evaluation signals reflect real user needs, domain constraints, and business objectives
- Build and maintain golden test cases and regression suites in Python, using both human-authored and AI-assisted test generation to capture critical behaviors and edge cases. These test suites are treated as first-class system components that evolve alongside the AI system itself
- Define, calibrate, and operate LLM-based graders, aligning automated judgments with expert human assessments. They investigate where evaluation signals diverge from real-world outcomes and refine grading approaches to maintain signal quality as systems and domains evolve
- Work closely with Forward Deployed AI Engineers, Architects, Product Engineers, AI Strategists, and domain experts to ensure evaluation frameworks meaningfully guide system development and deployment in production
- What We Require
- 2+ years of software engineering experience
- Strong Python Engineering Skills: Write clean, maintainable Python and are comfortable building evaluation and experimentation pipelines that run in production environments. You treat evaluation code with the same rigor as application code
- Experience with Evaluation-Driven or Experiment-Driven Development: Experience using structured evaluation or experimentation frameworks to drive system iteration, and understand the pitfalls of overfitting to metrics that don't reflect real outcomes
- Ability to Translate Human Judgment into Code: Work with subject matter experts to elicit high-quality judgments and encode them into test cases, scoring functions, and graders that scale
- Systems-Oriented Mindset: Understand how evaluation interacts with prompts, agents, data, and deployment. You design evaluation systems that support fast iteration while maintaining trust and safety in production
- AI-Native Working Style: Use AI tools to generate tests, analyze failures, explore edge cases, and accelerate debugging and iteration
- Travel: Travel between 10-50% of the time, depending on the project, your role and level of interest in doing so
Requirements
- At Distyl, we build AI systems using Evaluation-Driven Development -an approach where evaluation is not an afterthought, but the primary mechanism for iterating, improving, and trusting AI behavior in production.
- This role is for engineers who believe that AI systems only improve when measurement is tightly coupled to development-and who want to apply that philosophy directly to systems that matter.
Benefits
Additional Information
About Distyl AI Distyl is an applied AI technology company partnering with the world's most ambitious institutions to rearchitect critical operations for the frontier of AI. Our customers include the largest companies in telecom, healthcare, insurance, manufacturing, consumer goods, and global social organizations. We research and deploy technologies that power AI-native operations - both for our partners and for Distyl itself. Our work spans research into self-constructing systems, the development of the most reliable execution of AI systems, and products that transform mission-critical workflows. As a result, Distyl's technologies affect some of the world's largest operations - from hundreds of millions of consumer interactions to tens of millions of supply chain transactions and millions of patient journeys. Distyl is backed by leading investors including Lightspeed Venture Partners, Khosla Ventures, Coatue, DST Global, and the board-members of 20+ F500s. The results reflect this approach: a 100% production deployment success rate for our customers and one of the few enterprise AI companies to run a profitable business.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at distyl? Share your experience