Member of Technical Staff - Distributed Systems
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Gimlet is building the next generation of AI infrastructure: large-scale AI datacenters and the orchestration platform that coordinates them. The future of AI will require vastly more compute than exists today. But as AI workloads become more complex and new hardware architectures emerge, simply deploying more GPUs isn't enough. The challenge is making increasingly diverse compute work together. Gimlet's platform intelligently partitions and routes workloads across heterogeneous hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade APIs without needing to think about hardware selection, placement, or optimization. We work with foundation labs, hyperscalers, and AI-native companies to power production workloads at massive scale and help define the infrastructure layer for the future of AI. Gimlet Labs is seeking a Member of Technical Staff focused on distributed systems. In this role, you will build the core platform that schedules, routes, and operates AI workloads reliably at production scale. You will work on systems that coordinate execution across thousands of nodes, expose stable production APIs, and ensure workloads run predictably under real-world load and failure conditions. This role is well-suited for engineers who enjoy building foundational infrastructure, understanding systems end-to-end, and operating at scale.
Responsibilities
- Design and build distributed systems that orchestrate and operate AI workloads at large scale
- Develop scheduling, routing, and resource management components that coordinate execution across many nodes and services
- Build production-grade APIs and control planes for deploying and managing workloads
- Implement mechanisms for reliability, availability, and fault tolerance in distributed environments
- Instrument systems for observability and debugging at scale
- Work closely with compilers, runtimes, and hardware to ensure end-to-end system correctness and performance
- You may be a good fit if
- Strong software engineering fundamentals
- Experience building or operating distributed systems in production environments
- Comfort reasoning about concurrency, failure modes, and tradeoffs in large-scale systems
- Strong candidates may also have
- Experience with Kubernetes or Kubernetes-adjacent systems beyond basic usage
- Experience designing service-oriented architectures using RPC or asynchronous messaging
- Familiarity with scheduling, queues, or resource management systems
- Experience building reliable APIs and operating systems under high load
- Software development experience in languages commonly used for systems development (e.g., Go, C++, Python)
- What Makes Gimlet Different
- As an early member of the team, you will have significant ownership, work alongside highly technical engineers, and help shape both the systems we build and how we scale the company.
- We value people who are excited to work across domains, take ownership of meaningful problems, and build technology that enables the next generation of AI.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at gimlet? Share your experience