Skip to main content
Back to jobs

Staff Software Engineer, Compute Architecture

External
CoreWeave logoCoreweave · Manhattan, NY
$188K–$275K/yrFull-timeOn-site2w ago
CI/CDDesign SystemsGrafanagRPCIncident ResponseKafka
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

As a Staff Software Engineer within our Compute Architecture organization, you will help build the software systems that operate the backbone of our large-scale GPU data centers. The METALDEV team builds Go-based distributed services that bring new infrastructure online, manage hardware lifecycle workflows, monitor production health, and automate safe operations across fleets of GPU servers and rack-scale systems. This is a software-first role at the intersection of distributed systems, production reliability, and hardware-aware automation, where your work directly improves the reliability, safety, and scalability of real-world infrastructure.

Responsibilities

  • Design, build, and operate Go-based services that manage the lifecycle of large-scale GPU data center infrastructure.
  • Build automation for data center bring-up, hardware discovery, health monitoring, remediation, and production operations.
  • Develop reliable APIs, services, and workflows for managing BMCs, firmware state, server health, and rack-level infrastructure.
  • Improve observability, alerting, and operational tooling so production issues can be detected, understood, and resolved quickly.
  • Translate incidents and hardware failure modes into software improvements that make the platform more resilient.
  • Partner with hardware-adjacent, infrastructure, operations, and software teams to design systems that work safely at fleet scale.
  • Provide technical leadership through design reviews, code reviews, architectural guidance, and mentorship.
  • Make pragmatic architecture decisions that balance reliability, simplicity, scalability, and operational burden.

Requirements

  • B.S., M.S., or PhD in Computer Science or related field, or equivalent experience.
  • 8+ years of software engineering experience with a strong focus on infrastructure, cloud engineering, and distributed databases-particularly within large-scale datacenter and cloud environments.
  • Expertise in Go and proven experience building REST/gRPC APIs for mission-critical platforms.
  • Strong background in architecting and scaling cloud-native Kubernetes infrastructure and distributed services.
  • Proven success in mentoring engineers, leading technical projects, and influencing engineering strategy across teams.
  • Experience contributing to and collaborating with open source communities.
  • Skilled in applying a data-driven approach to reliability, optimization, and continuous improvement.
  • Excellent communicator able to work effectively with both technical and non-technical stakeholders.
  • Hands-on experience with observability stacks (Prometheus, Grafana, PromQL), CI/CD pipelines, and operating large fleets of GPU servers.
  • Track record of leading incident response, postmortems, and driving robust service reliability.
  • Nice To Have Skills
  • Working knowledge of Kafka, ClickHouse and CRDB.
  • DMTF, RedFish APIs, and GPU servers.
  • Wondering if you're a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams - even if you aren't a 100% skill or experience match.
  • Why CoreWeave?
  • Be Curious at Your Core
  • Act Like an Owner
  • Empower Employees
  • Deliver Best-in-Class Client Experiences
  • Achieve More Together

Benefits

The range weHealth insuranceEquity / stock optionsPerformance bonus

Additional Information

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com .


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at CoreWeave? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect