Senior Machine Learning Engineer, ML Infrastructure - Offline

External

Unity · Shanghai, China

Full-timeOn-site1w ago

AirflowMachine LearningObservabilityPythonPyTorchSpark

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

Unity Vector builds an offline ML platform that powers insight, experimentation, attribution, and AI-driven decision-making across the company. Our systems operate at scale across batch and streaming data, supporting analytics, product intelligence, machine learning pipelines, and business operations. As data volume and complexity grow, our platform also supports large-scale model training, feature generation, and experimentation workflows that power production ML systems. To support this growth, we need strong technical ownership to ensure our ML pipelines remain reliable, scalable, and architecturally sound. We are seeking a senior ML engineer to design and evolve the large-scale offline platform. This role focuses on building reliable infrastructure for generating training datasets, orchestrating ML workflows, and enabling efficient, distributed model training at scale. You will work closely with ML engineers and platform teams to ensure our pipelines can efficiently handle growing data volumes and increasingly complex training workloads. You will play a key role in shaping how model datasets are prepared as well as model training, validated, and delivered to distributed training systems, while ensuring the reliability, scalability, and performance of our offline ML platform.

Responsibilities

Design and operate large-scale data pipelines that generate training datasets used for machine learning training and experimentation
Develop infrastructure that supports distributed training workflows using technologies such as Pytorch, Ray Data, and Ray Train, etc.
Integrate ML pipelines with workflow orchestration systems (e.g., Flyte, Airflow, or similar) to enable reliable multi-stage training workflows
Improve reproducibility and observability of ML pipelines through dataset validation, monitoring, and automated testing
Optimize performance and resource utilization across distributed compute systems used for data processing and model training
Partner closely with ML engineers to enable efficient large-scale experimentation and model iteration
Lead architectural improvements to ensure our offline ML pipelines remain scalable, reliable, and cost-efficient

Requirements

Experience working with distributed computing frameworks such as Ray, Spark, Flink and familiarity in the Ray ecosystem (Ray Data, Ray Train) for distributed data processing and model training
Experience building and optimizing large-scale distributed ML training pipelines with Torch Compilation, Quantization, CUDA, GPU kernel optimization etc.
Experience building infrastructure for training data generation, dataset preparation, or ML feature pipelines
Deep experience designing and operating production-grade data pipelines
Strong programming skills in Python and experience working with large-scale distributed workloads
Experience with modern data infrastructure (data lakes, warehouses, orchestration systems, streaming platforms)
Strong systems thinking, with the ability to reason about performance, scalability, reliability, and cost tradeoffs in distributed systems
Proven ability to lead technical direction and influence architectural decisions across teams without formal authority
Additional information
Relocation support is not available for this position
Work visa/immigration sponsorship is not available for this position

Benefits

At Unity, we want our team members to thrive. We offer a wide range of benefits designed to support well-being and work-life balance.Please note: Benefits eligibility, specific offerings, and coverage vary based on the country and employment status.Life at UnityUnity is an equal opportunity employer committed to fostering an inclusive, innovative environment with the best employees. Therefore, we provide employment opportunities without regard to age, race, coloHealth insurancePaid time offEquity / stock options

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Unity? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect