Senior Engineer, Inference Control Plane
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Design and build scalable, multi-tenant services that power AI inference and intelligent routing workloads.
- Develop and operate high-scale distributed systems with strong reliability, availability, and performance goals.
- Strengthen platform resiliency through improved observability, capacity management, automation, and operational tooling.
- Partner closely with platform, GPU infrastructure, and product engineering teams to deliver production-grade systems and highly available APIs.
- Raise the engineering bar through strong software design, operational discipline, incident management, and continuous improvement practices.
- Contribute to architecture decisions around traffic management, service orchestration, reliability, and platform scalability.
- Participate in on-call rotations and lead efforts to reduce operator pain, improve service health, and prevent recurring incidents.
Requirements
- Required
- 5+ years of experience building and operating multi-tenant platforms or distributed backend systems
- Strong experience operating high-scale distributed services in production environments
- Deep understanding of SRE principles, including observability, incident management, reliability engineering, capacity planning, and operational automation
- 1+ years of hands-on experience with Go / Golang in production systems
- 1+ years of experience with Kubernetes
- Strong understanding of cloud-native architectures, microservices, and distributed systems fundamentals
- Experience debugging performance, scalability, and reliability issues in production systems
- Observability Proficiency: Experience tracking infrastructure and inference metrics like Time To First Token (TTFT), Time Per Output Token (TPOT), and GPU utilization.
- Bonus
- AI/ML Framework Knowledge: Understanding of modern LLM serving architectures and familiarity with engines like vLLM or Triton.
- Experience with API gateways, traffic routing, or service mesh technologies
- Familiarity with LLM serving stacks such as vLLM, TensorRT-LLM, or similar technologies
- Experience building systems for inference optimization, rate limiting, routing, or workload orchestration
- Compensation Range:
- $139,000 - $174,000
- *This is a hybrid role
- JR: 2026-7622
- #LI-Hybrid
- Why You'll Like Working for DigitalOcean
- DigitalOcean is an equal-opportunity employer. We do not discriminate on the
Benefits
Additional Information
Dive in and do the best work of your career at DigitalOcean. Journey alongside a strong community of top talent who are relentless in their drive to build the simplest scalable cloud. If you have a growth mindset, naturally like to think big and bold, and are energized by the fast-paced environment of a true industry disruptor, you'll find your place here. We value winning together-while learning, having fun, and making a profound difference for the dreamers and builders in the world. We are seeking a Senior Engineer to implement and contribute to the design and optimization of our Serverless Inference infrastructure and APIs. In this role, you will tackle the challenges of large-scale AI workloads, focusing on throughput, GPU utilization, and fault tolerance to support next-generation inference needs of AI native enterprises.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at digitalocean98? Share your experience