Senior Technical Product Manager, Observability
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Technical Product Managers at Nscale own the definition, delivery, and ongoing evolution of a slice of the Nscale platform, partnering with engineering, design, and go-to-market to turn customer and operational problems into shippable outcomes. As a Senior Technical Product Manager for Observability, you own the platform that gives customers and internal operators real-time visibility into their GPU fleet: the telemetry pipeline that scrapes data from physical infrastructure, the aggregation and storage layer, and the observability surfaces (logs, metrics, and traces) that enable fleet management, incident response, and alerting at scale. You partner daily with Fleet Software, Network Engineering, Data Centre Operations, and customer teams to make fleet health visible, actionable, and reliable as Nscale scales from a handful of deployments to a globally distributed fleet.
Responsibilities
- Own the roadmap for Nscale's observability platform: the telemetry pipeline, log and metrics aggregation, trace collection, and customer facing APIs and dashboards that surface fleet health to customers and operators.
- Define how logs, metrics, and traces are captured from physical infrastructure, aggregated, and surfaced through the observability platform to enable customers to manage their fleet and handle incidents.
- Own alerting strategy and optimisation: define what matters, reduce noise, and ensure the right signal reaches the right person at the right time.
- Capture and prioritise new telemetry requirements as the fleet scales, working with engineering to extend coverage across new hardware, sites, and deployment types.
- Shadow incident reviews and site operations to turn recurring manual effort and visibility gaps into platform capabilities.
- Define and drive the metrics that matter: alert signal-to-noise ratio, time-to-detect, time-to-resolve, telemetry coverage, and platform reliability.
- Mentor junior PMs and raise the bar for PRDs, reviews, and product decisions across the team.
- What you need
- 5-8 years in product management, with a track record owning significant areas in observability, infrastructure, or operations-facing products.
- Demonstrated experience building observability stacks: you have owned a product that captures and surfaces logs, metrics, and traces at scale, and you understand the architectural and UX tradeoffs involved.
- Hands-on experience with Prometheus, Loki, Mimir, Datadog, Grafana, or OpenTelemetry.
- Experience with deployment tooling in a data centre or infrastructure context, including provisioning workflows, networking automation, or zero-touch deployment pipelines.
- Experience building for operators and delivery teams (design engineers, project controllers, PMs, SREs, DC technicians) and a genuine appetite for their workflows.
- Strong technical fluency: you can lead architecture and trade-off discussions across telemetry pipelines, time-series storage, alerting systems, and observability integrations.
- A record of moving ambiguous operational problems to shipped outcomes that measurably improve visibility, incident response, or fleet reliability.
- Excellent written and verbal communication across engineers, operators, and executives.
Requirements
- Broader observability problem domain experience across different toolsets beyond the above stack.
- Familiarity with bare-metal provisioning tools (OpenStack Ironic, MAAS, or similar) or network automation tooling (NetBox, Nautobot, or similar).Degree in CS or engineering, or prior experience as an engineer, SRE, or infrastructure operator.
- Familiarity with GPU or accelerated compute infrastructure, data centre operations, or hyperscaler-style deployment at scale.
- ITSM: Jira Service Management, ServiceNow, Zendesk, or Freshservice.
- Experience in high-growth environments where the product is being built alongside the fleet it monitors.
- Join Nscale as we build a world-class AI cloud platform. If you're excited about owning the software that turns contracts into live GPU capacity, we'd love to hear from you!
- At Nscale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities. We strongly encourage app
Benefits
Additional Information
About Nscale Nscale is taking on the hyperscalers by building a vertically integrated GenAI cloud platform. We own the data centres, software, and applications that power today's AI stack using sustainable technology solutions. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. Collaboration is key, and we work together swiftly and respectfully, embracing adaptability and resilience in all we do.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at nscaleoperationsukltd? Share your experience