Skip to main content
Back to jobs

Senior Systems Software Engineer, AI Stack and Performance - DGX Station

External
NVIDIA logoNvidia · Santa Clara, CA
Full-timeOn-site2w ago
PythonTensorFlowPyTorchiOSMicroservices
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Multi-User & Concurrency: Validate multi-user and concurrent workload scenarios-multiple users running simultaneous training jobs, inference serving alongside development, and resource isolation via MIG or time-slicing. Ensure DGX Station performs reliably as a shared workstation.
  • Stack Validation: Validate the full NVIDIA AI software stack on DGX Station: CUDA toolkit, cuDNN, TensorRT, NCCL, Triton Inference Server, DCGM, and DOCA/OFED. Ensure version compatibility, functional correctness, and performance parity with reference data center configurations.
  • What we need to see:
  • BS or MS or equivalent experience in Computer Science, Electrical Engineering, or related field.
  • 12+ years in systems software engineering with hands-on experience in AI/ML workload optimization, GPU performance analysis, or deep learning infrastructure.
  • Strong proficiency with deep learning frameworks-PyTorch, TensorFlow, or JAX-including internals: graph execution, operator dispatch, memory management, and custom kernel integration.
  • Experience profiling and optimizing GPU workloads using Nsight Systems, Nsight Compute, CUPTI, or equivalent. Ability to read GPU traces and translate observations into actionable optimizations.
  • Strong understanding of GPU architecture: compute units, memory hierarchy, NVLink, multi-GPU scaling, and how they impact AI workload performance.
  • Experience with inference optimization: quantization (INT8/FP8), model compilation (TensorRT, torch.compile), batching strategies, and serving frameworks.
  • Proficiency in C/C++, CUDA, and Python. Comfortable reading and modifying GPU kernels.
  • Ways to stand out from the crowd:
  • Experience optimizing LLM training or inference on multi-GPU NVIDIA systems (DGX, HGX, or multi-GPU workstations).

Additional Information

DGX Station (Galaxy) is NVIDIA's workstation-class AI computer-built on GB300 Blackwell GPUs with NVLink interconnect, delivering data-center-grade AI compute in a deskside form factor. DGX Station is shipped to OEM and OSV partners as a complete SW/FW GA release including firmware bundles, DGX BaseOS, GPU drivers, CUDA toolkit, DCGM, and DOCA/OFED. For DGX Station to deliver on its promise, AI applications like NemoClaw, LLM inference via NIM, Hermes agents, and deep learning frameworks must run production-ready out of the box-optimized for the multi-GPU, high-bandwidth architecture of this platform. We are looking for a deeply technical systems software engineer who will own AI stack readiness on DGX Station. You will profile workloads, identify bottlenecks across GPU compute, NVLink, memory, and host interconnects, drive optimizations across the full stack-from GPU kernels through frameworks to applications-and work hands-on with framework, compiler, and GPU architecture teams to ensure DGX Station delivers best-in-class performance for real AI workloads in multi-user and multi-GPU configurations.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at NVIDIA? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect