Skip to main content
Back to jobs

Site Reliability and DevOps Engineering Lead

External
merative logoMerative · Remote
Full-timeRemote6d ago
BashCapacity PlanningCI/CDGitIncident ResponseJava
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • People & Team Leadership
  • Lead, mentor, and grow Platform / DevOps engineers
  • Build a high-performing Platform team
  • Drive accountability for platform reliability and delivery outcomes
  • Lead vendors to deliver capabilities in production.
  • Production Engineering & Platform Operations
  • Ensure platform capabilities accelerate product delivery, remove bottlenecks.
  • Defines and enforces platform engineering standards and DevOps practices across all teams and vendors
  • Lead capacity planning, performance optimization, and cost efficiency
  • Define operational standards, runbooks, and reliability practices
  • Accountable for platform reliability outcomes at enterprise/product level
  • Platform Strategy and Leadership
  • Act as technical authority across platform, reliability, and delivery
  • Define platform strategy and roadmap
  • Govern delivery across internal teams and vendors
  • Platform Reliability Ownership
  • Own SLIs, SLOs, and error budgets
  • Lead resilience engineering, observability, and failure design
  • Drive proactive risk reduction and continuous improvement
  • Own incident management frameworks and continuous improvement
  • CI/CD and Release Engineering
  • Own end-to-end pipeline architecture and release automation
  • Standardize, secure, and fully automate pipelines
  • Drive continuous integration, delivery, and validation practices
  • Incident Leadership
  • Lead Sev1 response, escalation, and recovery
  • Own RCA and drive systemic fixes (not point fixes)
  • Introduce AI-enabled pipeline optimization and quality gates
  • Embed AI into monitoring, risk prediction, and CI/CD optimization
  • Drive automation to reduce operational toil and improve decision-making
  • Required Skills:
  • Bachelor's degree in computer science, Engineering, or a related field.
  • 6-10 years of hands-on experience in software operations, DevOps and Site Reliability Engineering, including managing large-scale, mission-critical systems.
  • Clear and confident communication skills with ability to lead teams and collaborate effectively across engineering, product, and architecture teams.
  • Proven track record ensuring high availability and performance in production environments, with expertise in fault-tolerant, distributed system design.
  • Excellent understanding of modern software delivery pipelines and DevOps practices, including CI/CD, configuration management, and version control (Git).
  • Exceptional problem-solving skills, with experience diagnosing complex system issues under pressure and driving them to resolution.
  • Strong proficiency in at least one programming or scripting language (e.g., Python, Bash, or Java) for automation and tool integration.
  • Self-driven and proactive, with a passion for automating manual processes and continuo

Benefits

Health insurance

Additional Information

Micromedex by Merative is a trusted clinical decision support solution used by clinicians in thousands of hospitals, health systems, payers, and government agencies worldwide. For over 50 years, we've delivered evidence-based drug, toxicology, and disease information to help clinicians make confident, timely decisions and educate patients at the point of care. Today, Micromedex is evolving. With a modernized homepage and AI-powered search, clinicians can now find precise answers faster-supported by rigorously validated, evidence-based content. Our portfolio includes drug reference, IV compatibility, pediatric dosing, toxicology databases, and integrated calculators, all accessible via web and mobile. By combining authoritative content with intuitive, AI-enhanced tools, Micromedex empowers healthcare organizations to improve medication safety, reduce adverse events, and deliver better patient outcomes. Micromedex is seeking a highly skilled Platform Reliability & DevOps Engineering Lead who combines deep hands-on expertise in cloud services, infrastructure, and automation with a strong architectural understanding of distributed, high-availability systems. You will lead the platform team, ensuring our mission-critical clinical platform is highly available (24×7), performant, scalable, and secure. This role is both strategic and hands-on: you will define and drive the platform reliability and DevOps strategy, continuously improving system resilience and CI/CD capability, while partnering closely with engineering teams and vendors to embed operational excellence across the software lifecycle. You will be accountable for the end-to-end reliability, operability, and delivery capability of the Micromedex platform, unifying Site Reliability Engineering, DevOps, and CI/CD ownership into a single platform function. This includes owning platform reliability outcomes, DevOps enablement, and delivery pipelines to support scalable, high-availability systems and faster, safer releases. You are passionate about automation, proactive in addressing reliability and performance challenges, and committed to maintaining the trust of clinicians worldwide through resilient system design, strong operational discipline, and rapid incident response.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at merative? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect