Skip to main content
Back to jobs

Principal Software Engineer - Rack Scale Systems Infrastructure

External
NVIDIA logoNvidia · Santa Clara, CA
Full-timeOn-site3w ago
GoRustKubernetesiOS
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Define the complete software architecture for rack-scale infrastructure products and services, covering control plane services, infrastructure management, firmware, operating systems, kernel drivers, networking fabrics, accelerator software, and user-mode manageability software.
  • Mentor senior engineers and technical leads, raising the engineering bar for large-scale networked systems, foundational software, and rack-scale control plane development.
  • Make high-quality technical decisions in ambiguous environments, balancing customer needs, schedule, hardware realities, software maintainability, open source adoption, and long-term infrastructure evolution.
  • What We Need To See:
  • BS or MS in Computer Engineering, Computer Science, Electrical Engineering, or a related field, or equivalent experience. Proven experience (15+ years) in systems architecture, system software, distributed systems, infrastructure control planes, or infrastructure engineering.
  • Solid architectural knowledge of coordination frameworks, state machines, declarative APIs, reconciliation loops, lifecycle orchestration, failure handling, upgrade and rollback workflows, and distributed systems tradeoffs.
  • Practical coding skills in Go, C++, or Rust, encompassing the capability to write, review, and direct production-quality infrastructure software. Experience with Rust is highly valued.
  • Strong understanding of data center networking technologies and protocols, such as Ethernet, InfiniBand, RDMA, and fabric-level manageability. Experience with complex accelerator-based systems, including GPUs, DPUs, FPGAs, custom silicon, or other high-performance computing systems.
  • Experience crafting software intended for open source release, including API stability, modularity, documentation, community usability, and clean separ

Additional Information

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology-and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent. As an NVIDIAN, you'll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world. At NVIDIA, as a Principal Rack Scale Systems Infrastructure Engineer, you will build and guide the development of software systems. These systems support our upcoming rack-scale infrastructure products and services. This exceptional role sits where software meets hardware. You will work on control planes, state machines, orchestration systems, firmware, OS lifecycle, and networking fabrics. Your task is to compose infrastructure-as-a-service control plane software that converts complex rack-scale hardware into dependable, manageable, and programmable infrastructure for NVIDIA, partners, and leading cloud and enterprise clients globally.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at NVIDIA? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect