Skip to main content
Back to jobs

Senior Engineer, Inference Control Plane

External
digitalocean98 logoDigitalocean98 · Seattle
$139K–$174K/yrFull-timeOn-site1w ago
Capacity PlanningDigitalOceanKubernetesMicroservicesObservabilityRouting
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Design and build scalable, multi-tenant services that power AI inference and intelligent routing workloads.
  • Develop and operate high-scale distributed systems with strong reliability, availability, and performance goals.
  • Strengthen platform resiliency through improved observability, capacity management, automation, and operational tooling.
  • Partner closely with platform, GPU infrastructure, and product engineering teams to deliver production-grade systems and highly available APIs.
  • Raise the engineering bar through strong software design, operational discipline, incident management, and continuous improvement practices.
  • Contribute to architecture decisions around traffic management, service orchestration, reliability, and platform scalability.
  • Participate in on-call rotations and lead efforts to reduce operator pain, improve service health, and prevent recurring incidents.

Requirements

  • Required
  • 5+ years of experience building and operating multi-tenant platforms or distributed backend systems
  • Strong experience operating high-scale distributed services in production environments
  • Deep understanding of SRE principles, including observability, incident management, reliability engineering, capacity planning, and operational automation
  • 1+ years of hands-on experience with Go / Golang in production systems
  • 1+ years of experience with Kubernetes
  • Strong understanding of cloud-native architectures, microservices, and distributed systems fundamentals
  • Experience debugging performance, scalability, and reliability issues in production systems
  • Observability Proficiency: Experience tracking infrastructure and inference metrics like Time To First Token (TTFT), Time Per Output Token (TPOT), and GPU utilization.
  • Bonus
  • AI/ML Framework Knowledge: Understanding of modern LLM serving architectures and familiarity with engines like vLLM or Triton.
  • Experience with API gateways, traffic routing, or service mesh technologies
  • Familiarity with LLM serving stacks such as vLLM, TensorRT-LLM, or similar technologies
  • Experience building systems for inference optimization, rate limiting, routing, or workload orchestration
  • Compensation Range:
  • $139,000 - $174,000
  • *This is a hybrid role
  • JR: 2026-7622
  • #LI-Hybrid
  • Why You'll Like Working for DigitalOcean
  • DigitalOcean is an equal-opportunity employer. We do not discriminate on the

Benefits

Health insurancePaid time offFlexible scheduleEquity / stock optionsPerformance bonus

Additional Information

Dive in and do the best work of your career at DigitalOcean. Journey alongside a strong community of top talent who are relentless in their drive to build the simplest scalable cloud. If you have a growth mindset, naturally like to think big and bold, and are energized by the fast-paced environment of a true industry disruptor, you'll find your place here. We value winning together-while learning, having fun, and making a profound difference for the dreamers and builders in the world. We are seeking a Senior Engineer to implement and contribute to the design and optimization of our Serverless Inference infrastructure and APIs. In this role, you will tackle the challenges of large-scale AI workloads, focusing on throughput, GPU utilization, and fault tolerance to support next-generation inference needs of AI native enterprises.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at digitalocean98? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect