Skip to main content
Back to jobs

Senior DevOps Engineer

External
khealthcareers logoKhealthcareers · New York, NY
$135K–$210K/yrFull-timeOn-site3w ago
API GatewayArgoCDAWSBashCDNCI/CD
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

We are looking for a Staff DevOps Engineer to join our DevOps team at K Health. You will own and evolve the infrastructure underpinning a healthcare AI platform serving patients and enterprise health system partners. This is a high-ownership role: you will architect and operate cloud environments across K Health and its enterprise partners, lead complex infrastructure migrations, drive disaster recovery programs, and help build the next generation of AI-powered operations tooling. You will also mentor junior engineers and collaborate closely with product and engineering teams across the company. This is a hybrid role based in New York City (4 days/week in office) and includes participation in a daytime on-call rotation. This role requires onsite presence in our New York City office 4 days a week and does not provide immigration support. Behind every leading health system is K Health's AI-powered virtual care engine. Esteemed health systems like Mayo Clinic, Cedars-Sinai, Mass General Brigham, Hackensack Meridian Health, and Hartford Healthcare partner with K Health to build and run modern primary virtual care clinics on their behalf. Our deeply integrated model modernizes the primary care loop by using AI to put humans first. For our patients, we offer clinical AI (i.e., PatientGPT) and unparalleled access to close care gaps around the clock. For our Providers, we deliver provider-serving agentic solutions (i.e., Perfect Note) to eliminate administrative overload and burnout. And for the health systems, we deploy our top-grade Virtualists in AI-powered virtual clinics 24/7 to capture the patients' care journeys at step one, retain the journey through the system for longitudinal care, and strengthen profitability.

Responsibilities

  • Own the design, implementation, and evolution of our GKE-based Kubernetes infrastructure across K Health and enterprise partner environments.
  • Build and maintain our Terraform modular infrastructure library, including reusable modules with automated testing, across GCP, Cloudflare, and AWS.
  • Architect, build, and maintain GitLab CI/CD shared pipeline templates used by all engineering teams (build, test, security scanning, deployment).
  • Own and maintain self-hosted infrastructure software running in-cluster, including GitLab, ArgoCD, Langfuse, DependencyTrack, NGINX Ingress, and others.
  • Implement and support security and compliance controls across infrastructure and the software supply chain - secrets management, pipeline secret detection, container scanning, SOC2 and HIPAA.
  • Drive disaster recovery readiness: design failover scenarios, author runbooks, and lead periodic DR tests.
  • Lead development of AI-powered operations tooling and agentic infrastructure.
  • Monitor, troubleshoot, and improve production system reliability; respond to incidents during on-call shifts.
  • Mentor junior DevOps engineers and establish team-wide engineering standards.

Requirements

  • 5+ years of experience in DevOps, platform engineering, or site reliability engineering.
  • Deep, hands-on experience with Kubernetes and the surrounding ecosystem - Helm, Helmfile, ArgoCD, Kyverno, cert-manager, and NGINX Ingress.
  • Extensive experience with Google Cloud Platform - GKE, Cloud SQL, Memorystore, Cloud Storage, IAM, and Workload Identity.
  • Strong Terraform expertise: modular architecture, multi-environment provisioning, and automated testing.
  • Advanced knowledge of GitLab CI/CD and GitOps practices.
  • Proficiency in Python and/or Go.
  • Plus:
  • Advanced Bash scripting skills.
  • Experience with secrets management solutions such as Akeyless or HashiCorp Vault.
  • Experience with database administration across PostgreSQL, Redis, and MongoDB - including DR configuration and operational runbooks.
  • Experience with Datadog or equivalent observability platform (APM, infrastructure, log management).
  • Experience with Cloudflare for DNS, CDN, and security rules management.
  • Demonstrated experience designing and executing disaster recovery programs, including failover testing and runbook authorship.
  • Bonus:
  • Experience in highly regulated environments - SOC2 and HIPAA.
  • Excellent communication skills with the ability to lead cross-functional infrastructure initiatives.
  • Demonstrated leadership experience, including mentoring junior engineers.
  • Experience with HPC or GPU cluster infrastructure, including Slurm..
  • Experience building or operating AI agents or agentic infrastructure.
  • Experience with microservices architecture and API gateway / reverse proxy patterns.
  • Experience with AWS.
  • Benefits & Perks: #LI-Hybrid
  • Hybrid work schedule with weekly lunches and stocked fridges
  • Monthly social committees for company events
  • 18 vacation days, 9 company holidays, 5 sick days, and 2 personal days
  • Stock options for every full-time employee
  • Paid parental leave
  • 401k benefit
  • Commuter Benefits
  • Competitive health, dental, and vision insurance options

Benefits

$135,000 - $210,000 USDHealth insuranceDental insuranceVision insurance401(k)Paid time offEquity / stock optionsPerformance bonusParental leave

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at khealthcareers? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect