Senior Machine Learning Engineer, ML Infrastructure - Online
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Unity Vector builds ML infrastructure that powers real-time prediction, experimentation, attribution, and AI-driven decision-making across the company. Our online ML systems serve production models at scale, supporting low-latency inference, large-scale experimentation, model deployment and optimization, feature processing, and business-critical decisioning. As model complexity, traffic volume, and experimentation velocity continue to grow, our inference platform must remain reliable, scalable, observable, and cost-efficient. To support this growth, we need strong technical ownership to evolve the online ML infrastructure that enables ML teams to safely deploy, validate, and operate production models at scale. We are seeking a senior/staff ML engineer to design and evolve Unity Vector's online model inference platform. This role focuses on building reliable infrastructure for serving machine learning models in production, optimizing inference performance, and enabling safe, efficient experimentation across high-traffic online systems. You will work closely with ML engineers, platform teams, and product stakeholders to ensure models can be deployed, scaled, monitored, and iterated on efficiently. You will play a key role in shaping how models are packaged, served, validated, monitored, and optimized in production environments. This role requires strong systems thinking, deep experience with production ML infrastructure, and the ability to drive architectural improvements across teams.
Responsibilities
- Design and operate large-scale online inference infrastructure that serves production ML models with low latency and high reliability.
- Build and improve model serving systems using technologies such as PyTorch, Triton Inference Server, Kubernetes, GKE, Ray, or similar distributed serving frameworks.
- Optimize inference performance through batching, model compilation, GPU/CPU utilization improvements, request scheduling, and runtime-level tuning.
- Develop infrastructure for model deployment, canary testing, A/B experimentation, traffic splitting, rollback, and production validation.
- Improve observability of online ML systems through latency, throughput, error-rate, cost, saturation, and model-health monitoring.
- Build self-healing and autoscaling capabilities to support dynamic experiment traffic, changing model complexity, and production reliability requirements.
- Partner closely with ML engineers to support faster model iteration while maintaining production safety, scalability, and cost efficiency.
- Improve the reliability and reproducibility of model serving workflows, including model packaging, artifact validation, compatibility testing, and deployment automation.
- Lead architectural improvements that make the online ML platform more robust, user-friendly, scalable, and cost-efficient.
Requirements
- Strong experience building and operating production-grade online ML inference systems.
- Experience with model serving frameworks such as NVIDIA Triton Inference Server, TorchServe, Ray Serve, TensorFlow Serving, or similar systems.
- Experience optimizing inference workloads using techniques such as dynamic batching, model compilation, quantization, GPU acceleration, GPU kernel optimization, caching, or runtime tuning.
- Strong experience with distributed systems, Kubernetes, autoscaling, service reliability, and production observability.
- Strong programming skills in Python, with practical experience working on production ML systems and high-scale services.
- Experience with PyTorch and modern model deployment workflows, including model packaging, validation, and serving lifecycle management.
- Experience designing infrastructure for safe model rollout, canary testing, A/B experimentation, and automated rollback.
- Strong systems thinking, with the ability to reason about latency, throughput, reliability, scalability, and cost tradeoffs in online systems.
- Proven ability to lead technical direction and influence architectural decisions across teams without formal authority.
- Additional information
- Relocation support is not available for this position
- Work visa/immigration sponsorship is not available for this position
Benefits
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Unity? Share your experience