Senior Manager, Observability
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
CoreWeave is seeking a Senior Manager, Observability Engineering to lead a team responsible for building, scaling, and operating observability systems across metrics, logs, traces, and telemetry pipelines. In this role, you will define strategy and roadmap, drive platform reliability and performance improvements, and guide architectural decisions across observability infrastructure. You will partner closely with infrastructure, platform, security, and application engineering teams to improve instrumentation and production visibility. This role combines technical leadership, operational ownership, and team management to ensure observability platforms scale with business and customer needs.
Requirements
- 8+ years of software engineering experience with production systems at scale
- 4+ years of engineering management experience leading senior engineers and technical leads
- Experience building and operating observability platforms across logs, metrics, traces, and alerting in distributed systems
- Knowledge of reliability engineering concepts including SLOs, SLIs, incident management, error budgets, and fault-tolerant design
- Experience scaling telemetry systems including collection pipelines, storage backends, and query layers
- Experience with distributed systems, performance engineering, and trade-offs involving scale, resilience, and cost
- Experience partnering with infrastructure, security, and application engineering teams to drive platform adoption
- Experience hiring and managing engineering teams
- Preferred:
- Experience with OpenTelemetry, Grafana, Prometheus-compatible systems, log aggregation, and distributed tracing tools
- Experience operating cloud-native infrastructure, including Kubernetes environments
- Experience supporting large-scale cloud, developer platforms, or AI/ML infrastructure
- Familiarity with capacity planning for high-ingest telemetry systems
- Experience scaling platforms in high-growth environments
- Wondering if you're a good fit?
- We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams - even if you aren't a 100% skill or experience match. Here are a few qualities we've found compatible with our team. If some of this describes you, we'd love to talk.
- You love solving complex systems problems across infrastructure, reliability, and developer productivity
- You're curious about improving observability, debugging workflows, and large-scale distributed systems
- You're an expert in balancing scalability, reliability, and cost in production environments
- Why CoreWeave?
- Be Curious at Your Core
- Act Like an Owner
- Empower Employees
- Deliver Best-in-Class Client Experiences
- Achieve More Together
- The base salary range for this role is zł369,000 to zł493,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determi
Benefits
Additional Information
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com . We're proud to be a Living Wage accredited Employer.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at coreweaveu? Share your experience