Senior Machine Learning Engineer, Synthetic Data & Document Understanding

External

Abbyy · Bangalore, India

Full-timeHybrid1mo ago30+ days old, may be filled

LeadershipMachine LearningPythonPyTorchStatistical Analysis

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

We are seeking a Senior Machine Learning Engineer - Synthetic Data & Document Understanding to own the synthetic data generation track within ABBYY's Document AI Data team . This role focuses on building generative pipelines that produce high-quality, diverse, and realistic synthetic training data at scale. You will ensure synthetic data meaningfully improves downstream model performance by maintaining strong alignment with real-world document structures, formats, and statistical properties. This is an ideal role for engineers who combine deep generative modeling expertise with rigorous data quality evaluation and production engineering skills .

Responsibilities

Technical Development & Innovation
Design and implement pipelines that analyze real documents to inform high-fidelity synthetic data generation
Build generative systems capable of producing documents across diverse formats, layouts, and domains
Develop evaluation frameworks to ensure synthetic data maintains distributional fidelity and diversity
Research and apply generative modeling techniques suited for document AI training
Identify and mitigate quality issues to ensure synthetic data is effective for downstream model training
Partner with Modeling teams to measure the impact of synthetic data on model performance
Project Ownership & Leadership
Own the synthetic data generation track end-to-end , from architecture to quality validation
Drive architectural decisions balancing quality, diversity, scale, and cost efficiency
Define and maintain data quality metrics and generation dashboards
Collaborate closely with annotation teams to ensure compatibility with downstream pipelines
Contribute to roadmap planning alongside Principal-level leadership
Infrastructure & Scale
Build scalable pipelines capable of generating millions of synthetic training examples
Implement post-processing, filtering, and validation mechanisms to remove low-quality outputs
Design cost-efficient workflows balancing compute, quality, and throughput
Develop monitoring systems to detect distribution shifts or quality degradation over time
Collaborate with Platform teams on compute orchestration, storage, and scheduling

Requirements

Education & Experience
MS or PhD in Computer Science, Engineering, Mathematics, or related field
5+ years of experience in Machine Learning / AI , with focus on:
Generative models
Vision-Language Models (VLMs)
Synthetic data systems
Proven experience building and evaluating synthetic data pipelines for ML training
Strong background in data quality evaluation and statistical analysis
Technical Expertise
Deep expertise in Vision-Language Models and document understanding (layout, structure, semantics)
Strong knowledge of generative modeling for structured and semi-structured data
Understanding of what makes synthetic data valuable:
Distributional fidelity
Diversity
Realistic noise patterns
Domain coverage
Strong programming skills in Python with experience in PyTorch or similar frameworks
Experience evaluating data quality via automated metrics and downstream model impact
Familiarity with large-scale data pipelines, cloud environments, and experiment tracking
Leadership & Communication
Proven ability to independently own complex technical workstreams
Strong collaboration across data, modeling, and platform teams
Ability to clearly communicate data quality and generation trade-offs
Data-driven mindset with strong attention to coverage gaps and quality signals
Here are some of our local benefits:
Comprehensive medical, accidental, and life insurance
Weekly wellness sessions to support your physical and mental well-being
A generous paid time off policy
Join ABBYY, and you will:
Love how you work
We provide remote and hybrid working options to fit all lifestyles.
We use flexible hours across most of our teams to allow you to find your own definition of balance.
Encouraging a culture of giving, we provide two paid volunteering days off every year so you can take time to contribute to the causes you care about.
To ensure your family is cared for, we offer paid parental leave in all our locations.
Love whom y

Benefits

Dental insuranceVision insuranceRemote work optionsFlexible scheduleParental leave

Additional Information

Join ABBYY and be part of a team that celebrates your unique work style. With flexible work options, a supportive team, and rewards that reflect your value, you can focus on what matters most - driving your growth, while fueling ours. Our commitment to respect, transparency, and simplicity means you can trust us to always choose to do the right thing. As a trusted partner for purpose-built AI and intelligent automation, we solve highly complex problems for our enterprise customers and put their information to work to transform the way they do business. Over 10,000 customers trust ABBYY, including many Fortune 500 ones. You will work on further developing a portfolio already containing client names such as DHL, Johnson & Johnson, FDA, DMV, PwC, KeyBank, Spotify, and H&R BLOCK.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at abbyy? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect