Skip to main content
Back to jobs

Senior AI Engineer - LLM Evaluation

External
autodesk logoAutodesk · Bengaluru, India
Full-timeOn-siteToday
AWSAzureCI/CDDeep LearningGCPGenerative AI
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Lead AI evaluation and quality strategy for Generative AI and Machine Learning (ML)-powered systems
  • Design and implement scalable evaluation frameworks for Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) systems, agentic workflows, and traditional Machine Learning (ML) models
  • Fine-tune, evaluate, benchmark, and deploy Large Language Models (LLMs) in production environments
  • Implement evaluation and observability standards, including regression testing, monitoring, tracing, feedback loops, and quality measurement for AI systems
  • Define and track AI quality metrics, including accuracy, robustness, bias, drift, grounding, latency, safety, and hallucination rates
  • Build automated evaluation pipelines and integrate them into Continuous Integration and Continuous Deployment (CI/CD) workflows for continuous model validation
  • Evaluate and adopt AI testing frameworks, benchmark methodologies, and emerging evaluation tools
  • Build scalable Retrieval-Augmented Generation (RAG) pipelines using structured and unstructured enterprise data sources
  • Integrate AI agents with Autodesk platforms and enterprise services using Model Context Protocol (MCP)-based integrations or equivalent architectures
  • Partner with Product Managers, Architects, Software Engineers, Experience Designers, and Data Scientists to deliver AI-powered experiences
  • Ensure AI systems align with Responsible AI principles and enterprise governance requirements
  • Contribute to reusable AI platform capabilities, Software Development Kits (SDKs), orchestration patterns, and engineering best practices
  • Leverage AI coding assistants to accelerate feature delivery while maintaining rigorous validation through testing, Continuous Integration and Continuous Deployment (CI/CD), code reviews, and security best practices

Requirements

  • Bachelor's or Master's degree in Computer Science, Machine Learning, Data Science, Engineering, or equivalent practical experience
  • 5+ years of hands-on experience delivering Artificial Intelligence (AI) and Machine Learning (ML)-powered systems into production
  • Experience designing evaluation frameworks, automated testing solutions, benchmarking systems, or model validation pipelines
  • Strong understanding of Large Language Model (LLM) evaluation, model quality measurement, and AI observability concepts
  • Strong proficiency in Python with experience building automation frameworks and testing infrastructure
  • Experience deploying and operating AI applications in production, including monitoring, observability, and continuous improvement
  • Experience building cloud-native applications on Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP)
  • Excellent technical communication and collaboration skills
  • Demonstrated experience using AI coding tools to accelerate the delivery of production systems
  • Hands-on experience with AI evaluation frameworks such as LangSmith, DeepEval, TruLens, Promptfoo, OpenAI Evals, or similar tools
  • Experience evaluating Retrieval-Augmented Generation (RAG) systems, agentic workflows, AI assistants, and prompt-based applications
  • Familiarity with Responsible AI, AI governance, model risk assessment, and AI observability platforms
  • Experience deploying cloud-native AI solutions on Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP)
  • Familiarity with deep learning architectures such as Transformers and modern Machine Learning (ML) frameworks, including PyTorch, Lightning, Ray, or equivalent technologies
  • Experience with enterprise knowledge systems, semantic search, vector search, or contextual retrieval systems
  • Experience with emerging agentic AI frameworks such as LangGraph, CrewAI, OpenAI Software Development Kits (SDKs), Claude Software Development Kits (SDKs), Model Context Protocol (MCP), Agent-to-Agent (A2A), or equivalent technologies
  • Contributions to open-source AI projects, technical blogs, conference presentations, or published research are an advantage
  • The I

Additional Information

Job Requisition ID # 26WD98447 Position Overview If you love building real Artificial Intelligence (AI) systems that solve real customer problems and are passionate about measuring, improving, and scaling AI quality, this role is for you. The Applied AI team within Autodesk's Architecture, Engineering, and Construction (AEC) organization builds cloud-native AI solutions that make industry workflows smarter, faster, and more connected. This role is primarily focused on AI evaluation and quality engineering for Generative AI systems. You will establish evaluation standards, build automated validation frameworks, and drive observability practices that ensure AI systems are reliable, trustworthy, and production-ready. In addition to AI evaluation, you will contribute to the design and development of AI-powered applications, including Retrieval-Augmented Generation (RAG) systems, agentic workflows, and Large Language Model (LLM)-based solutions deployed at scale.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at autodesk? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect