AI Quality Engineer

External

Allegisglobalsolutions · Bengaluru, India

Full-timeOn-site3w ago

AzureCI/CDDockerGitGitHubIncident Response

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

Testing AI systems is a fundamentally different problem than testing traditional software. Outputs are non-deterministic. "Correct" is often a spectrum. And the failure modes-hallucinations, drift, prompt injection-don't show up in unit tests. We need an engineer who understands this and can build the testing strategies, evaluation frameworks, and quality infrastructure to keep our agents reliable in production. As an AI Quality Engineer, you'll design how we test intelligent agents, agentic workflows, and Foundation Layer capabilities. This is not a manual QA role-you'll write code, build evaluation pipelines, and create automated testing frameworks that run in CI/CD. You'll define what "quality" means for AI systems at AGS and build the systems to measure it. You'll work across every solution the team builds, which means you'll have broad visibility into the architecture and deep understanding of how our agents behave in the real world. If you're an engineer who cares about quality and wants to solve testing problems that most teams haven't figured out yet, this is the role.

Responsibilities

Testing Strategy & Design
Define testing strategies for AI agents, conversational interfaces, and agentic workflows
Design behavioral test suites for non-deterministic outputs-where "correct" isn't binary
Build evaluation frameworks that measure groundedness, factuality, relevance, and task completion
Identify failure modes specific to AI systems: hallucinations, prompt injection, context window limitations, drift
Develop testing approaches for each architecture pattern: RAG, function calling, human-in-the-loop, autonomous workflows
Test Automation & Infrastructure
Build automated evaluation pipelines that run as part of CI/CD
Create test harnesses for LLM-based systems-mocking, fixtures, and reproducible test scenarios
Develop regression suites that detect quality degradation when prompts, models, or data change
Build monitoring and alerting for production agent quality (accuracy, latency, error rates)
Maintain test infrastructure: test data management, environment setup, reporting dashboards
Evaluation & Metrics
Define quality metrics for each solution-what to measure and what thresholds matter
Build and maintain evaluation datasets (ground truth, reference outputs, edge case collections)
Conduct systematic prompt evaluation when prompts or models change
Track quality trends over time and identify when re-evaluation is needed
Report quality metrics to the team and stakeholders in clear, actionable terms
Collaboration & Quality Culture
Partner with AI Solutions Engineers to define testability requirements during design
Work with AI Solutions Analysts to translate acceptance criteria into test scenarios
Review solution designs from a quality and testability perspective
Advocate for quality practices across the team-testing isn't an afterthought, it's part of delivery
Contribute to incident response by diagnosing quality failures and building regression tests

Requirements

Required
3-7 years of software engineering or quality engineering experience
Strong programming skills in Python and/or TypeScript-you write test code, not just test cases
Experience designing and building automated test frameworks
Understanding of AI/ML systems-you know why testing LLM outputs is different from testing deterministic code
Experience with CI/CD pipelines and integrating automated tests into build processes
Ability to reason about non-deterministic systems and design meaningful quality metrics
Strong analytical skills-you can look at agent outputs and determine whether they're good enough
Preferred
Experience testing AI/ML applications, conversational interfaces, or chatbots
Background in LLM evaluation: prompt testing, groundedness scoring, factuality checking
Familiarity with evaluation frameworks (DeepEval, Ragas, custom evaluation pipelines)
Experience with Microsoft Power Platform (Power Automate, Copilot Studio) testing
Background in Azure services and cloud-based test infrastructure
Experience with load testing and performance testing for API-based systems
Familiarity with staffing, HR tech, or workforce management domains
Technology Stack
Languages: Python, TypeScript
Platforms: Azure (Container Apps, Functions, AI Services), Microsoft 365
Testing: pytest, evaluation frameworks (DeepEval, Ragas, custom), load testing tools
AI/ML: LLM evaluation, prompt testing, RAG evaluation, behavioral testing
Data: REST APIs, Dataverse, SQL
Tools: Git, GitHub, CI/CD pipelines, Docker, monitoring/alerting (Application Insights)
We don't expect expertise in everything. AI quality engineering is a new discipline-we expect strong engineering fundamentals and the ability to figure out new problems.
What We're NOT Looking For
Manual testers who write test cases in spreadsheets
QA professionals who treat testing as a gate at the end of development rather than a practice woven into it
People who expect deterministic pass/fai

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Allegisglobalsolutions? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect