Senior Software Engineer, Cluster Orchestration
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
As part of the Cluster Orchestration team, you will play a key role in advancing CoreWeave's orchestration platform including SUNK (Slurm on Kubernetes) and beyond our Kubernetes-native foundation that powers AI training and inference at scale. This is an opportunity to help shape one of the most critical layers of the AI cloud: ensuring workloads run seamlessly, reliably, and efficiently across massive GPU clusters. By building the systems that eliminate infrastructure bottlenecks and create new orchestration capabilities, you will directly empower customers to innovate faster and push the boundaries of what's possible with AI.
Requirements
- ~3-5 years of professional software engineering experience building distributed systems or cloud services.
- Strong coding in Go (Python or C++ a plus) with solid CS fundamentals.
- Hands-on experience running Kubernetes at production scale.
- Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry).
- Proven ability to improve service reliability and performance using metrics (P95/P99 latency, throughput, error budgets).
- Preferred
- Familiarity with orchestration and workflow technologies such as Ray, Kubeflow, Kueue, Istio, Knative, or Argo Workflows
- Experience with distributed workloads, GPU-based applications, or ML pipelines.
- Knowledge of scheduling concepts like quota enforcement, pre-emption, and scaling strategies.
- Exposure to reliability practices including SLOs, alarms, and post-incident reviews.
Benefits
Additional Information
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com . CoreWeave is the AI Hyperscaler™, delivering a cloud platform of cutting edge services powering the next wave of AI. Our technology provides enterprises and leading AI labs with the most performant, efficient and resilient solutions for accelerated computing. Since 2017, CoreWeave has operated a growing footprint of data centers covering every region of the US and across Europe. CoreWeave was ranked as one of the TIME100 most influential companies of 2024. As the leader in the industry, we thrive in an environment where adaptability and resilience are key. Our culture offers career-defining opportunities for those who excel amid change and challenge. If you're someone who thrives in a dynamic environment, enjoys solving complex problems, and is eager to make a significant impact, CoreWeave is the place for you. Join us, and be part of a team solving some of the most exciting challenges in the industry. CoreWeave powers the creation and delivery of the intelligence that drives innovation.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at CoreWeave? Share your experience