Senior AI Engineer - LLM Evaluation

External

Autodesk · Bengaluru, India

Full-timeOn-siteToday

AWSAzureCI/CDDeep LearningGCPGenerative AI

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Lead AI evaluation and quality strategy for Generative AI and Machine Learning (ML)-powered systems
Design and implement scalable evaluation frameworks for Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) systems, agentic workflows, and traditional Machine Learning (ML) models
Fine-tune, evaluate, benchmark, and deploy Large Language Models (LLMs) in production environments
Implement evaluation and observability standards, including regression testing, monitoring, tracing, feedback loops, and quality measurement for AI systems
Define and track AI quality metrics, including accuracy, robustness, bias, drift, grounding, latency, safety, and hallucination rates
Build automated evaluation pipelines and integrate them into Continuous Integration and Continuous Deployment (CI/CD) workflows for continuous model validation
Evaluate and adopt AI testing frameworks, benchmark methodologies, and emerging evaluation tools
Build scalable Retrieval-Augmented Generation (RAG) pipelines using structured and unstructured enterprise data sources
Integrate AI agents with Autodesk platforms and enterprise services using Model Context Protocol (MCP)-based integrations or equivalent architectures
Partner with Product Managers, Architects, Software Engineers, Experience Designers, and Data Scientists to deliver AI-powered experiences
Ensure AI systems align with Responsible AI principles and enterprise governance requirements
Contribute to reusable AI platform capabilities, Software Development Kits (SDKs), orchestration patterns, and engineering best practices
Leverage AI coding assistants to accelerate feature delivery while maintaining rigorous validation through testing, Continuous Integration and Continuous Deployment (CI/CD), code reviews, and security best practices

Requirements

Bachelor's or Master's degree in Computer Science, Machine Learning, Data Science, Engineering, or equivalent practical experience
5+ years of hands-on experience delivering Artificial Intelligence (AI) and Machine Learning (ML)-powered systems into production
Experience designing evaluation frameworks, automated testing solutions, benchmarking systems, or model validation pipelines
Strong understanding of Large Language Model (LLM) evaluation, model quality measurement, and AI observability concepts
Strong proficiency in Python with experience building automation frameworks and testing infrastructure
Experience deploying and operating AI applications in production, including monitoring, observability, and continuous improvement
Experience building cloud-native applications on Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP)
Excellent technical communication and collaboration skills
Demonstrated experience using AI coding tools to accelerate the delivery of production systems
Hands-on experience with AI evaluation frameworks such as LangSmith, DeepEval, TruLens, Promptfoo, OpenAI Evals, or similar tools
Experience evaluating Retrieval-Augmented Generation (RAG) systems, agentic workflows, AI assistants, and prompt-based applications
Familiarity with Responsible AI, AI governance, model risk assessment, and AI observability platforms
Experience deploying cloud-native AI solutions on Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP)
Familiarity with deep learning architectures such as Transformers and modern Machine Learning (ML) frameworks, including PyTorch, Lightning, Ray, or equivalent technologies
Experience with enterprise knowledge systems, semantic search, vector search, or contextual retrieval systems
Experience with emerging agentic AI frameworks such as LangGraph, CrewAI, OpenAI Software Development Kits (SDKs), Claude Software Development Kits (SDKs), Model Context Protocol (MCP), Agent-to-Agent (A2A), or equivalent technologies
Contributions to open-source AI projects, technical blogs, conference presentations, or published research are an advantage
The I

Additional Information

Job Requisition ID # 26WD98447 Position Overview If you love building real Artificial Intelligence (AI) systems that solve real customer problems and are passionate about measuring, improving, and scaling AI quality, this role is for you. The Applied AI team within Autodesk's Architecture, Engineering, and Construction (AEC) organization builds cloud-native AI solutions that make industry workflows smarter, faster, and more connected. This role is primarily focused on AI evaluation and quality engineering for Generative AI systems. You will establish evaluation standards, build automated validation frameworks, and drive observability practices that ensure AI systems are reliable, trustworthy, and production-ready. In addition to AI evaluation, you will contribute to the design and development of AI-powered applications, including Retrieval-Augmented Generation (RAG) systems, agentic workflows, and Large Language Model (LLM)-based solutions deployed at scale.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at autodesk? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect