Principal Operations Engineer, Hardware - Data Center Operations
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
We are seeking a Principal Operations Engineer, Hardware to serve as the most senior technical authority for the operational hardware fleet across our hyperscale AI data center portfolio. AI infrastructure lives and dies on the reliability of the compute itself - this role exists to ensure that the GPU systems, servers, and supporting hardware we deploy at scale are operated, maintained, and continuously improved at the standard the workload demands. You will operate as the technical arm of senior operations leadership in the field - leading site assessments and operational audits, driving the technical readiness of teams ahead of site activation, reviewing hardware platforms and integration designs from an operational lens, and feeding operational learnings back into the hardware engineering, deployment, and supply chain organizations as we shift toward a productized, repeatable build model. You will be a force multiplier across our site hardware leads, deployment teams, and reliability engineers, and the connective tissue between hardware operations, hardware engineering, network, facilities, and customer-facing teams. The ideal candidate has spent a career operating hardware at scale - in hyperscale data centers, large HPC environments, or comparable 24/7 infrastructure - and is equally comfortable diagnosing a stubborn boot failure on the floor, leading a fleet-wide root cause investigation, and pushing back on a vendor on a flawed RMA process. Formal engineering credentials are valued but not required - practical depth, judgment under pressure, the ability to teach, and the discipline to keep critical infrastructure running through change are what define this role.