Lead, Data Science Operations

External

Echo · Chicago, IL

Full-timeRemote5d ago

AirflowAWSAzureCI/CDDockerDocumentation

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Requirements

Required
Hands-on experience operating ML or software systems in production: an MLOps, DevOps, SRE, platform, or data science background with demonstrated production ownership.
Strong working knowledge of CI/CD pipelines, deployment automation, and a major cloud platform (AWS, Azure, or GCP).
Demonstrated expertise in error handling, fault tolerance, and designing systems that fail gracefully (retries, fallbacks, alerting, monitoring/observability).
Proficiency in Python (R a plus), and a working understanding of how ML models are packaged, served, monitored, and retrained.
Comfort serving as first point of contact for production issues, including an on-call / off-hours expectation.
A teaching disposition, with the ability to translate complex infrastructure into clear guidance for colleagues who are not infrastructure specialists.
Preferred
Experience standing up monitoring and observability for a portfolio of production models or services (e.g., drift detection, performance tracking, alerting).
Familiarity with containerization (Docker) and orchestration (Kubernetes), infrastructure-as-code, and model-serving frameworks.
Familiarity with MLOps tooling such as MLflow, Airflow, or Kubeflow, or managed equivalents (e.g., SageMaker, Vertex AI), and with data/model versioning.
Experience working across an engineeri

Benefits

Health insurance

Additional Information

The Data Science Operations Lead sits at the intersection of Data Science, Engineering, and IT Architecture: a senior individual-contributor role focused on the operational side of the model lifecycle, including deployment, monitoring, scaling, and maintenance. Echo runs a growing portfolio of models in production, and this role exists to keep that portfolio reliable, observable, and well-governed without pulling our Data Scientists away from building new capabilities. The Lead is the team's resource for moving models from R&D to production services, the first line on production issues, and the standing point of contact with Architecture on everything deployment- and reliability-related. What You'll Own Model deployment partnership. Serve as Data Science's primary counterpart to the Architecture / Platform Engineering team on model deployment. Own the day-to-day collaboration, hand-offs, and coordination. Data Scientists typically hand off a trained model and its training data. Engineering needs a running service: an API, a web tool, something the business can call. Your job is to bridge that gap. Production reliability and incident response. Act as first point of contact for production issues (outages, errors, degraded endpoints) across all deployed models and endpoints. This role carries an explicit on-call / off-hours availability expectation; production issues don't keep business hours, and shielding the development team from that interruption is central to the job. Resilient, error-aware systems. Bring rigor to error handling and fault tolerance. Design and enforce practices that prevent errors before they happen and ensure models and endpoints degrade or fail gracefully, with sensible fallbacks, retries, alerting, and recovery paths. Monitoring and observability. Establish and maintain the monitoring and observability needed to manage a portfolio of production models as an enterprise capability by tracking model health, endpoint performance, latency, logging, and prediction quality. Deployment expertise and team enablement. Develop a detailed, working understanding of the deployment system as it continues to evolve, and act as the team's guide. Help Data Scientists move from experiment to production quickly and safely, and drive the templating, documentation, and automation that reduce the time the team spends on infrastructure. Governance and quality. Own versioning, reproducibility, and operational governance for models in production, partnering with Architecture on the standards and controls that keep our model and algorithm footprint trustworthy. Who You Might Be This role sits at the intersection of data science and software/DevOps, and strong candidates arrive from either side of that line: A software, DevOps, or platform engineer who has grown toward data science, having started in infrastructure, CI/CD, or production operations and since learned how data science models are built, served, and monitored. A data scientist who has grown toward infrastructure, DevOps, and MLOps, having started by building models and since moved deliberately toward deployment, reliability, and the engineering discipline of keeping models healthy in production. What Success Looks Like The Data Science team spends materially less time on deployment logistics and incident response, and more on new development. Production issues are caught early, triaged quickly, and resolved or escalated cleanly, with clear ownership. Deployment becomes a repeatable, well-understood path for the team rather than a per-model project. Data Science and Architecture operate as two well-aligned sides of one bridge.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at echo? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect