Senior / Staff Technical Program Manager - Datacenter Capacity Delivery (E2E)
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
The DC Delivery E2E TPM is the single-threaded owner for delivering data center capacity from forecast → site strategy → design → construction → infrastructure readiness → go-live. You will operate as the SSOT (Single Source of Truth) for delivery milestones, risks, and capacity outcomes while orchestrating cross-functional execution across internal teams and external partners. This is a frontier-scale role where ambiguity is high, timelines are compressed, and stakes are critical to company growth.
Responsibilities
- End-to-End Capacity Delivery
- Own delivery of AI-optimized data center capacity (colo, build-to-suit, retrofits, and owned facilities) from pre-contract planning through operational readiness.
- Deliver MW-scale infrastructure aligned to aggressive GPU/AI system deployment targets.
- Drive clarity from ambiguity-translate high-level demand signals into executable delivery programs.
- Program Structuring & Execution
- Decompose complex build programs into workstreams with clear owners, milestones, and deliverables.
- Build integrated plans spanning real estate, power/energy, design, procurement, construction, and deployment.
- Establish critical path visibility and aggressively manage schedule compression.
- Cross-Functional Leadership
- Orchestrate execution across:
- Real estate & site selection
- Power & energy strategy (utilities, PPAs, interconnects)
- Data center design (MEP, liquid cooling, high-density racks)
- Supply chain & long-lead equipment procurement
- Construction & commissioning
- Infrastructure deployment (rack/cluster install)
- Networking & backbone connectivity
- Security, compliance, and operations readiness
- Act as the primary interface with colocation providers, EPCs, utilities, and key vendors.
- Risk, Cost & Governance
- Identify and drive resolution of critical risks, constraints, and blockers across power, equipment, permitting, and supply chain.
- Own and maintain program budgets, CapEx forecasts, and capital allocation narratives.
- Provide structured updates and escalation paths to executive leadership.
- Demand & Capacity Planning Integration
- Partner with capacity planning, AI infrastructure, and finance teams to translate model demand into site-level capacity strategies.
- Align build plans with power availability, network topology, and hardware rollout schedules.
- Continuously optimize for time-to-capacity and cost-per-MW / cost-per-GPU deployed.
- Operational Excellence
- Drive E2E improvements in delivery through:
- Standardization of build and commissioning processes
- Implementation of program tooling and dashboards
- Post-mortems and lessons learned loops
- Establish scalable mechanisms to support rapid global expansion.
- Communication & Leadership
- Serve as SSOT for program health, milestones, and risks.
- Deliver concise, high-signal updates to senior executives.
- Operate effectively across distributed teams with up to 50% travel.
Requirements
- 12-15+ years in mission-critical facilities or data center operations
- Experience managing multi-site, vendor-heavy environments
- Strong expertise in electrical and mechanical systems
- Proven track record in improving uptime and performance
- Experience at hyperscalers (Google, Meta, Microsoft, AWS) or neo-cloud / AI infra companies (CoreWeave, Lambda, etc.).
- Familiarity with high-density AI workloads (liquid cooling, >30kW racks, GPU clusters).
- Experience with:
- Utility engagement and power delivery constraints
- Long-lead supply chain planning (transformers, switchgear, chillers)
- Commissioning and data center handover processes
- Ability to operate in high-growth, ambiguous environments with limited structure.
- Strong executive presence and ability to influence without authority.
- What sets you apart
- Extreme ownership mindset - you treat capacity delivery as a personal SL
Benefits
Additional Information
Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras , to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Cerebras Systems? Share your experience