Technical Product Manager - AI Compute Platform

External

Nebius · Amsterdam, Netherlands

Full-timeRemoteToday

API DesignCapacity PlanningIAMKubernetesObservability

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

Our customers build the frontier of AI on top of Nebius - training state-of-the-art models, running production inference at scale, shipping the research and products that define where the field is going next. We are building the AI cloud that the people building the frontier of AI choose deliberately - not on price, not on raw capacity, but on how it works to use it day to day. To do that, we are growing the AI Compute Platform product team and hiring multiple Technical Product Managers across the full surface of the platform. Your scope will be defined by what you bring. We will match your technical strengths, customer experience, and product instincts to the area of the platform where you can have the most impact. The platform is broad - and at our scale, every slice is mission-critical. If you want to help build the best AI cloud in the world - and you have the technical depth to engage engineering leaders as a peer (not as a translator) and the comfort to talk to customers directly - this team is for you. The platform you'll help build: Hardware platforms & launch - bringing new GPU and CPU platforms (GB300, Vera Rubin, ARM/Grace, future generations) to production with full launch readiness across the stack. Cluster lifecycle & fleet operations - new region launches, 100,000+ GPU cluster bring-up, platform sharding and allocation architecture, release engineering, host-lifecycle automation, operational efficiency. Reliability & Mission Control - autohealing, health checks, SLA, fault-tolerant training, MTTR reduction, customer trust at scale, observability as a product. Customer experience & developer surface - Compute APIs, console, CLI, IMDS and in-VM signals, self-service workflows, notifications, customer-facing observability, unified UX across the product line. GPU & InfiniBand foundational services - drivers, firmware, NCCL, IB/RoCE, NVLink topology, the foundational layer everything else builds on. Managed runtime platforms - Soperator (Slurm-on-Kubernetes) and MK8S (Managed Kubernetes for AI workloads), powering training and inference for frontier labs. Platform integrations & emerging workloads - Token Factory integration, RL and agentic workload infrastructure, capacity sharing, new business surfaces as they emerge. Cross-platform program & delivery - NVIDIA partnership programs, major-maintenance orchestration, cross-stream releases. You will own one of the slices of this platform end-to-end - from strategy and roadmap through delivery, adoption, and measurable outcomes. Your responsibilities will include (regardless of which slice you own): Own end-to-end product responsibility for your area - strategy, roadmap, discovery, delivery, adoption, measurable customer and platform outcomes. Design and own the platform contracts customers depend on - APIs, semantics, system events, customer-facing surfaces, operational behavior - at hyperscaler quality. Drive cross-team execution across platform engineering, networking, storage, Soperator/MK8S, observability, IAM, billing, capacity planning, support, and product design. Turn customer pain into product commitments through structured discovery - interviews, usage analytics, support patterns, incident postmortems. Close the loop so the same class of failure or friction does not recur. Engage engineering as a technical peer - debate API design, reason about system trade-offs, judge the quality of platform internals, and push back when the design is wrong. Define and own success metrics - what you ship is measured by what changed for the customer or the platform, not by the size of the spec. Be the product voice that customer-facing teams (Support, CX, TAMs) escalate to when a system behavior, API contract, or operational pattern needs a product decision, not a workaround. We expect you to have: 6+ years in Product Management, Platform PM, Infrastructure PM, or SRE / Engineering Lead with strong product instincts. Strong technical foundation and cloud-infrastructure depth - comfort reasoning about API semantics, control-plane vs data-plane behavior, system events and lifecycle, multi-tenant operational realities. You can engage engineering leaders as a peer, not as a translator. E

Benefits

Health insurance

Additional Information

About Nebius: Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure. Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI. Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at nebius? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect