Technical Program Manager, Data Centers

External

Nebius · Amsterdam, Netherlands

Full-timeRemote1d ago

Capacity PlanningComplianceIncident ResponseLeadershipMoveSOC 2

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

We are looking for a Technical Program Manager to own the operational readiness and ongoing health of our fleet of Data Centers, both COLO and BTS sites. In this role you will be the single point of accountability for ensuring each site runs as expected - SLAs met, maintenance executed on schedule, and audits passed - across a growing portfolio of landlord-operated and purpose-built facilities. You will operate as the primary interface between Nebius and our data center landlords and operators, and you will partner closely with the Nebius IT team to translate site-level operations into reliable infrastructure for our customers. This is an individual contributor position for someone who is equally comfortable in a contractual SLA review, a maintenance window planning call, and a physical site audit. You will define the mechanisms that keep our sites accountable and surface risk before it becomes an incident. Key job responsibilities Own the operational health of Nebius COLO and BTS sites, ensuring each facility runs to expectation across power, cooling, space, connectivity, security, and environmental controls. Track, monitor, and enforce SLA compliance across landlords and colocation providers; identify breaches, drive remediation, and hold providers accountable to contractual commitments. Manage and coordinate site maintenance schedules - preventive and corrective - including planning and approving maintenance windows, reviewing Methods of Procedure (MOPs), and minimizing risk to live workloads. Plan and drive site audits covering compliance, capacity, power/cooling performance, physical security, and safety; track findings to closure. Serve as the primary day-to-day interface with data center landlords and operators, managing the operational relationship, escalations, and coordination of on-site activity. Partner closely with the Nebius IT team on deployments, capacity planning, incident response, and change management at each site. Build reporting mechanisms and dashboards that give leadership clear visibility into site health, SLA performance, maintenance status, and open risk across the portfolio. Lead incident coordination and post-incident follow-up, including root cause analysis and corrective action tracking with landlords and internal teams. Track and manage contractual operational obligations, deliverables, and timelines across multiple sites and providers simultaneously. The Data Center team is responsible for the physical infrastructure that underpins Nebius' AI cloud. We manage the full lifecycle of our COLO and BTS footprint - from bringing new capacity online to keeping live sites running reliably at scale. We work at the intersection of facilities operations, vendor management, and IT infrastructure, and we move fast because our customers' AI workloads depend on the reliability we deliver.

Requirements

10+ years of experience in technical program management, data center operations, or critical facilities/infrastructure management.
Experience managing data center infrastructure and operations (power, cooling, space, connectivity) in colocation, build-to-suit, or owned environments.
Experience managing third-party vendors, landlords, or service providers against SLAs and contractual obligations.
Demonstrated ability to manage multiple programs, sites, or workstreams simultaneously and drive them to measurable outcomes.
Bachelor's degree in a relevant field, or equivalent practical experience.
Direct experience with colocation (COLO) and build-to-suit (BTS) data center models, including operating across multiple landlords and operators.
Working knowledge of data center SLAs, MOPs/SOPs, maintenance regimes, and audit and compliance frameworks (e.g., Uptime Institute Tier standards, SOC 2, ISO 27001).
Experience supporting AI/HPC, GPU cluster, or other high-density compute infrastructure.
Strong familiarity with incident management and root cause analysis in a critical facilities context.
Experience building reporting mechanisms, dashboards, or operational scorecards for infrastructure health and risk.
PMP, Uptime ATD, or equivalent program/operations certification.
Pro

Benefits

Health insurance

Additional Information

About Nebius: Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure. Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI. Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at nebius? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect