Skip to main content
Back to jobs

Principal Site Reliability Engineer

External
draftkings logoDraftkings · Remote
Full-timeRemoteToday
AWSCapacity PlanningIncident ResponseKubernetesLeadershipLinux
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Define and execute the long-term strategy for our Kubernetes platform across Google Kubernetes Engine, Amazon Elastic Kubernetes Service, RKE2, and on-premise environments, ensuring reliability, scalability, and operational consistency.
  • Drive architectural decisions across critical infrastructure, including cluster lifecycle management, networking, identity and access management, observability, autoscaling, capacity planning, and cost optimization.
  • Lead large-scale platform initiatives across multiple engineering teams, establishing technical direction, engineering standards, and measurable outcomes that improve platform reliability and developer experience.
  • Establish and evolve reliability practices by defining service level objectives, service level indicators, and error budget frameworks that align platform performance with business priorities.
  • Build automation-first infrastructure through Infrastructure as Code, GitOps workflows, self-healing systems, and internal platform tooling that improve engineering velocity and reduce operational overhead.
  • Champion the responsible adoption of AI-powered engineering capabilities that improve operational efficiency, accelerate incident response, and enhance developer productivity.
  • Lead critical platform incidents, drive post-incident improvements, and strengthen platform resilience through automation, capacity planning, and operational excellence.
  • Mentor senior engineers, influence technical strategy across the organization, and elevate engineering excellence through architecture reviews, coaching, and technical leadership.

Requirements

  • A Bachelor's Degree in Computer Science or a related technical field.
  • At least 8 years of experience designing, operating, and scaling distributed cloud and on-premise infrastructure, including at least 3 years operating at the Staff, Principal, or equivalent technical leadership level.
  • Proven experience leading large-scale infrastructure or platform initiatives that require cross-functional alignment and long-term technical ownership.
  • Deep expertise with Kubernetes, including cluster architecture, networking, storage, security, operators, lifecycle management, and large-scale production operations.
  • Extensive experience building and operating production infrastructure in AWS and Google Cloud Platform using Infrastructure as Code technologies such as Terraform, Pulumi, or similar tools.
  • Strong software development experience in Go, Python, or both, with expertise in GitOps, continuous integration and continuous delivery, observability, distributed systems, Linux, and reliability engineering principles.
  • Experience incorporating AI-powered tools into engineering workflows while applying sound judgment around reliability, security, and operational risk.
  • #LI-MF1
  • Join Our Team

Benefits

Equity / stock optionsPerformance bonus

Additional Information

At DraftKings, AI is becoming an integral part of both our present and future, powering how work gets done today, guiding smarter decisions, and sparking bold ideas. It's transforming how we enhance customer experiences, streamline operations, and unlock new possibilities. Our teams are energized by innovation and readily embrace emerging technology. We're not waiting for the future to arrive. We're shaping it, one bold step at a time. To those who see AI as a driver of progress, come build the future together. The Crown Is Yours As a Principal Site Reliability Enginee r , you'll shape the long-term strategy for the infrastructure behind one of the most demanding platforms in sports betting and gaming. You'll drive the architectural direction of our cloud and on-premise platforms, helping engineering teams build, deploy, and operate highly reliable systems at scale. Working across Platform Engineering and Site Reliability Engineering, you'll influence how we modernize our infrastructure, strengthen operational excellence, and prepare our platform for the next generation of growth.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at draftkings? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect