Skip to main content
Back to jobs

Production Engineer, Support tooling (Tooling and Frameworks)

External
CoreWeave logoCoreweave · Livingston, NJ
$139K–$204K/yrFull-timeOn-site1w ago
KubernetesLeadershipLinuxObservabilityPythonRouting
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

The Senior Production Engineering team sits at the heart of CoreWeave's reliability efforts. In this role, you'll partner closely with our Support/CX teams to build, operate, and evolve internal tooling that enables a "Direct‑to‑Expert" support model at scale. You'll define and ship AI‑assisted workflows, self‑service diagnostics, and platform integrations that reduce time‑to‑resolution and improve customer experience across our cloud.

Responsibilities

  • Design, build, and own support-facing tools for case triage, intelligent routing, and expert engagement, integrating with incident and change management workflows.
  • Develop AI‑powered assistants and automations that accelerate root‑cause discovery, knowledge retrieval, and resolution quality.
  • Create and maintain dashboards, alerts, and signals that surface tooling issues early; integrate observability into new tooling to reduce MTTR.
  • Build self-service and guided diagnostics that empower Support/CX to resolve common issues and collect high‑quality context for escalations.
  • Codify reliability and support practices into services, APIs, and Kubernetes-native controllers/operators where appropriate.
  • Partner with engineering leadership and internal stakeholders to prioritize roadmap initiatives, land adoption, and measure business impact.
  • Participate in an on‑call rotation for the tooling you own.
  • What you've worked on (Minimum qualifications)
  • 4+ years of software or infrastructure engineering experience building and operating production services.
  • Proficiency in Go or Python (or equivalent experience).
  • Strong fundamentals in Linux, containers, and Kubernetes; comfortable debugging in distributed systems.
  • Experience with observability (metrics/logs/traces) and using data to improve reliability and support outcomes.
  • Demonstrated experience with incident management and steady‑state operational excellence (e.g., progressive delivery, testing strategies, error budgets, fault‑tolerant design).
  • Comfort collaborating with multiple stakeholders (Support/CX, Product, SRE, and service owners).

Requirements

  • Experience integrating or building support/operations tooling (e.g., ticketing/incident systems, status page, knowledge management, chat/alerting integrations).
  • Experience automating manual workflows and stitching together productivity platforms.
  • Familiarity with AI/ML tooling for retrieval, summarization, or copilot‑style assistance.
  • Experience codifying operational practices into Kubernetes controllers, operators, or platform services.
  • Wondering if you're a good fit?
  • We value diverse experiences. Even if you don't match every bullet, we'd still love to hear from you-especially if you're excited about reliability tooling, customer experience, and partnering across teams to make CoreWeave better every day.

Benefits

The range we've posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.In addition to a competitive salary, we offer a variety of benefits to support your needs. The benefits below reflect our US-based offerings; for roles in other locations, benefits vary and are shared during the hiring process. These include:Medical, dental, and vision insurance - 100% paid for by CoreWeaveCompany-paid Life InsuranceVoluntary supplemental life insuranceShort and long-term disability insuranceFlexible Spending AccountHealth Savings AccountTuition ReimbursementAbility to Participate in Employee Stock Purchase Program (ESPP)Mental Wellness Benefits through Spring HealthFamily-Forming support provided by CarrotPaid Parental LeaveFlexible, full-service childcare support with Kinside401(k) with a generous employer matchFlexible PTOCatered lunch each day in our office and data center locationsA casual work environmentA work culture focused on innovativeHealth insuranceDental insuranceVision insurance401(k)Paid time offFlexible scheduleEquity / stock optionsPerformance bonusParental leave

Additional Information

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com .


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at CoreWeave? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect