Senior Software Engineer, Compute Architecture
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
As a Senior Software Engineer within our Compute Architecture organization, you will help build the software control plane for hardware lifecycle management across large-scale GPU data centers. The METALDEV team builds Go-based distributed services that bring infrastructure online, monitor production hardware health, automate safe operational workflows, and give operators the observability and control needed to manage GPU servers and rack-scale systems with reliability and confidence. This is a software-first role at the intersection of distributed systems, production reliability, and hardware-aware automation, ideal for engineers who want their code to operate real-world infrastructure at massive scale.
Responsibilities
- Design, build, and operate Go-based services that manage the lifecycle of large-scale GPU data center infrastructure.
- Build automation for data center bring-up, hardware discovery, health monitoring, remediation, and production operations.
- Develop reliable APIs, services, and workflows for managing BMCs, firmware state, server health, and rack-level infrastructure.
- Improve observability, alerting, and operational tooling so production issues can be detected, understood, and resolved quickly.
- Translate incidents and hardware failure modes into software improvements that make the platform more resilient.
- Partner with hardware-adjacent, infrastructure, operations, and software teams to design systems that work safely at fleet scale.
Requirements
- 5+ years of experience building and operating infrastructure or backend systems.
- Bachelor's or Master's degree in Computer Science or a related field, or equivalent practical experience.
- Strong proficiency in Go for building production services and tools.
- Experience designing and building gRPC and REST APIs.
- Experience with Kubernetes and containerized workloads in production environments.
- Familiarity with observability tooling such as Prometheus and Grafana.
- Preferred
- Experience working with GPU-based systems.
- Experience with low-level hardware management such as BMCs or Redfish.
- Experience operating large-scale distributed systems or high-throughput infrastructure.
- Experience collaborating with or contributing to open-source projects (for example, Go, Redfish).
- Wondering if you're a good fit?
- We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams - even if you aren't a 100% skill or experience match. Here are a few qualities we've found compatible with our team. If some of this describes you, we'd love to talk.
- You enjoy working close to the hardware and are curious about how GPUs, servers, and data centers fit together.
- You thrive in infrastructure environments where reliability, performance, and automation matter as much as features.
- You like collaborating across hardware, platform, and product teams to solve complex, ambiguous problems.
- Why CoreWeave?
- Be Curious at Your Core
- Act Like an Owner
- Empower Employees
- Deliver Best-in-Class Client Experiences
- Achieve More Together
Benefits
Additional Information
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com .
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at CoreWeave? Share your experience