Senior / Staff Technical Program Manager - Datacenter Capacity Delivery (E2E)

External

Cerebras Systems · Remote

Full-timeRemote2w ago

AWSCapacity PlanningComplianceGenerative AILeadershipMachine Learning

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

The DC Delivery E2E TPM is the single-threaded owner for delivering data center capacity from forecast → site strategy → design → construction → infrastructure readiness → go-live. You will operate as the SSOT (Single Source of Truth) for delivery milestones, risks, and capacity outcomes while orchestrating cross-functional execution across internal teams and external partners. This is a frontier-scale role where ambiguity is high, timelines are compressed, and stakes are critical to company growth.

Responsibilities

End-to-End Capacity Delivery
Own delivery of AI-optimized data center capacity (colo, build-to-suit, retrofits, and owned facilities) from pre-contract planning through operational readiness.
Deliver MW-scale infrastructure aligned to aggressive GPU/AI system deployment targets.
Drive clarity from ambiguity-translate high-level demand signals into executable delivery programs.
Program Structuring & Execution
Decompose complex build programs into workstreams with clear owners, milestones, and deliverables.
Build integrated plans spanning real estate, power/energy, design, procurement, construction, and deployment.
Establish critical path visibility and aggressively manage schedule compression.
Cross-Functional Leadership
Orchestrate execution across:
Real estate & site selection
Power & energy strategy (utilities, PPAs, interconnects)
Data center design (MEP, liquid cooling, high-density racks)
Supply chain & long-lead equipment procurement
Construction & commissioning
Infrastructure deployment (rack/cluster install)
Networking & backbone connectivity
Security, compliance, and operations readiness
Act as the primary interface with colocation providers, EPCs, utilities, and key vendors.
Risk, Cost & Governance
Identify and drive resolution of critical risks, constraints, and blockers across power, equipment, permitting, and supply chain.
Own and maintain program budgets, CapEx forecasts, and capital allocation narratives.
Provide structured updates and escalation paths to executive leadership.
Demand & Capacity Planning Integration
Partner with capacity planning, AI infrastructure, and finance teams to translate model demand into site-level capacity strategies.
Align build plans with power availability, network topology, and hardware rollout schedules.
Continuously optimize for time-to-capacity and cost-per-MW / cost-per-GPU deployed.
Operational Excellence
Drive E2E improvements in delivery through:
Standardization of build and commissioning processes
Implementation of program tooling and dashboards
Post-mortems and lessons learned loops
Establish scalable mechanisms to support rapid global expansion.
Communication & Leadership
Serve as SSOT for program health, milestones, and risks.
Deliver concise, high-signal updates to senior executives.
Operate effectively across distributed teams with up to 50% travel.

Requirements

12-15+ years in mission-critical facilities or data center operations
Experience managing multi-site, vendor-heavy environments
Strong expertise in electrical and mechanical systems
Proven track record in improving uptime and performance
Experience at hyperscalers (Google, Meta, Microsoft, AWS) or neo-cloud / AI infra companies (CoreWeave, Lambda, etc.).
Familiarity with high-density AI workloads (liquid cooling, >30kW racks, GPU clusters).
Experience with:
Utility engagement and power delivery constraints
Long-lead supply chain planning (transformers, switchgear, chillers)
Commissioning and data center handover processes
Ability to operate in high-growth, ambiguous environments with limited structure.
Strong executive presence and ability to influence without authority.
What sets you apart
Extreme ownership mindset - you treat capacity delivery as a personal SL

Benefits

Health insurance

Additional Information

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras , to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Cerebras Systems? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect