Skip to main content
Back to jobs

Senior Machine Learning Operations Engineer

External
betmgminc logoBetmgminc · Hybrid, NJ
Full-timeHybridToday
API GatewayAWSCachingCI/CDdbtGitHub
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

The Senior MLOps Engineer treats ML systems as software systems and owns the path from a trained model to a production endpoint that meets its latency, cost, and reliability budgets - across both batch scoring (SageMaker Batch Transform, Snowflake Cortex / Snowpark ML, dbt-orchestrated scoring) and real-time inference (SageMaker real-time endpoints, Lambda + Bedrock, sub-second feature serving). The Senior Engineer builds the platform that data scientists and ML engineers ship on: feature store with guaranteed online/offline parity, model registry, CI/CD for ML, drift and quality monitoring, champion/challenger and shadow deployment scaffolding. This requires a software-engineering-first mindset - distributed systems, observability, and on-call instincts are the foundation; ML literacy makes them effective for this role. GenAI integration experience is a plus, not a requirement.

Responsibilities

  • ML Production Platform
  • Stand up and operate BetMGM's ML platform on AWS (SageMaker Training, Model Registry, Pipelines, Endpoints, Batch Transform) and Snowflake (Snowpark ML, Cortex), with Terraform-managed infrastructure.
  • Build self-service scaffolds that let data scientists ship a model end-to-end without a ticket queue - cookie-cutter project templates with CI, drift monitoring, alerting, IaC, and Snowflake connectivity pre-baked.
  • Batch and Real-Time Inference
  • Design and operate batch scoring pipelines - SageMaker Batch Transform, dbt-orchestrated scoring against Snowflake, Snowpark ML - with explicit freshness and cost SLAs.
  • Design and operate real-time inference paths - SageMaker real-time endpoints, Lambda + Bedrock for GenAI, API Gateway - with stated latency budgets (typically sub-100ms) and graceful degradation under load.
  • Own the feature store (SageMaker Feature Store, Tecton, or Feast) with guaranteed online/offline parity - training-serving skew is treated as an incident, not a tradeoff.
  • CI/CD and Deployment Patterns
  • Build CI/CD for ML - model registry, automated retraining triggers, model versioning, lineage from feature → training run → deployed model → live prediction.
  • Implement champion/challenger, shadow deployments, and canary releases as platform primitives so individual model teams do not reinvent them per project.
  • Monitoring, Drift & Reliability
  • Stand up drift detection, data quality, and model performance monitoring (Evidently, Arize, or SageMaker Model Monitor - pick one and standardize) with paging that routes to humans who can fix it.
  • Own MLOps incident response - production model failures are SEV events with postmortems.
  • Cost and Performance
  • Right-size endpoints, batch caching, request batching, and autoscaling. State cost-per-prediction targets up front and meet them.
  • GenAI Integration (Plus, Not Required)
  • Integrate LLM APIs (Bedrock, Anthropic, OpenAI) into production paths - RAG pipelines, agent eval frameworks, prompt versioning, cost and latency observability.
  • Partner with the Helix team on AI personalization workloads as they ramp toward March Madness 2027.
  • AI in the Engineering Loop
  • Direct AI coding agents (Claude Code, Cursor, GitHub Copilot, dbt Copilot) as a force multiplier across infrastructure code, eval suites, and model-serving glue - designing work for agents to do, not just accepting their suggestions.
  • Collaboration
  • Partner with the data engineering team on shared standards (Terraform modules, CI/CD patterns, observability, lineage).
  • Work alongside data scientists and analytics partners to land the right interfaces between research and production - opinionated about the boundary.
  • Coordinate with Entain India and contractor ML partners as workloads consolidate onto the BetMGM-owned platform.

Requirements

  • BS or MS in Computer Science, Math, Statistics, Machine Learning, or other

Benefits

Health insuranceDental insuranceVision insurance401(k)Flexible schedule

Additional Information

Discover What's Possible at BetMGM Ready to make your career legendary? Join us as we bring the magic of Vegas to our players. The BetMGM team has over 1,400 talented members, revolutionizing sports betting and online gaming in the United States and Canada. We're a brand with technology at our hearts and the most driven and focused talent in the business. As a valued team member, we're committed to giving you the resources and support you need to thrive. Our benefits and perks include: Medical, Dental, Vision, Life, and Disability Insurance 401(k) with company match Pre-tax spending accounts including health care FSA and commuter savings Flexible paid time off Professional development reimbursement and ongoing skills training opportunities Employee resource groups Swag, ticket giveaways, and more! At BetMGM, we recognize that every individual plays a meaningful role in our success. That's why we're committed to building a respectful, inclusive workplace. It's the strategy behind every win. By meeting people where they are, we create a culture of belonging where everyone can thrive and a workplace that reflects our values, our people, and our drive to win.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at betmgminc? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect