Senior Software Engineer, DGX Cloud Production Engineering

External

Nvidia · Santa Clara, CA

Full-timeOn-site2d ago

PythonGoExpressKubernetesTerraform

Prepare for this interview

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Build and operate automation for large-scale GPU clusters across NVIDIA Cloud Partners (NCP) and on-prem environments.
Develop tools and services for provisioning, validation, upgrades, monitoring, repair, and cluster lifecycle operations.
Improve Day 0 / Day 1 / Day 2 workflows for cluster bringup, handoff, and production operations.