ML Ops Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Requirements
- Experience with industrial, IoT or manufacturing platforms.
- Experience with feature stores, model registries, dataset versioning, and lineage systems.
- Experience with AI agents, RAG systems, production GenAI applications, or evaluation frameworks.
- Why This Role Matters
- This role is an opportunity to help build the production foundation behind Augury's Industrial AI Workforce.
- You will help transform ML and AI work from isolated experimentation into scalable, observable, reliable, and reusable production systems powering the next generation of industrial AI.
- If you enjoy building production-grade AI platforms, scaling ML systems on modern data infrastructure, and shaping the operational backbone of Industrial AI, we would love to meet you.
- Augury is a proud equal opportunity employer, we strive to create a work environment in which everyone, all applicants, employees, customers, guests, and vendors feel safe and comfortable. We commit to maintain a workplace that is free of a
Additional Information
Our mission is to transform how people and machines work together to push the boundaries of human productivity. A leader in Industrial AI, Augury helps the world's manufacturers leverage real-time production insights to drive new levels of efficiency. Combining predictive and prescriptive AI technology with industry expertise, production teams can proactively address alerts, minimize downtime, reduce asset costs, and maximize yield and capacity. Our customers achieve payback in six months or less, enabling global scale. We're looking for team members excited to partner with the world's manufacturers and build the future of production together. We are looking for a MLOps Engineer with strong production engineering experience building and operating scalable ML and AI systems. This is a software-first MLOps platform role focused on production reliability, ML lifecycle management, large-scale training infrastructure, operational AI systems, and reusable platform capabilities. You will help build and scale the production platform behind Augury's Industrial AI Workforce, enabling teams across the company to develop, evaluate, deploy, and operate ML and AI systems consistently and safely. A Day In Your Life Design and evolve production MLOps capabilities across the full ML lifecycle including datasets, features, models, evaluations, deployments, monitoring, retraining, and feedback signals. Build systems for experiment tracking, artifact management, reproducibility, versioning, lineage, promotion workflows, and production readiness. Develop reusable platform tooling, golden paths, and engineering standards that improve consistency and delivery velocity across teams. Build operational infrastructure for LLM and agentic systems including prompts, tools, traces, evaluations, observability, safety boundaries, and production monitoring. Design evaluation and monitoring frameworks for AI systems including answer quality, latency, grounding, reliability, and operational regressions. Build and optimize large-scale training pipelines supporting heterogeneous data sources and scalable compute patterns. Write clean, modular, production-grade Python services and platform libraries. Drive engineering quality through automated testing, CI/CD, observability, deployment standards, and operational best practices. What You Bring 5+ years of professional software engineering, MLOps, or ML platform engineering experience in production environments. Significant experience building or owning production ML infrastructure and lifecycle systems. Strong Python engineering skills with production-grade architecture, modular design, testing, packaging, and robust error handling. Strong understanding of the end-to-end ML lifecycle including training, deployment, monitoring, retraining, reproducibility, and lineage. Experience working with large-scale data platforms such as Databricks, Spark, Delta Lake, or equivalent ecosystems. Experience with ML platform and MLOps frameworks such as MLflow, Metaflow, Kubeflow, or equivalent ML lifecycle-management systems. Proven ability to design reusable workflow orchestration using Airflow, Metaflow, or Databricks, covering automation, scheduling, dependency management, and production reliability. Familiarity with operational patterns for LLMOps, AgentOps, and production AI systems. Strong written and verbal communication skills in English.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at augury? Share your experience