Staff Engineer, Observability
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
We are seeking a highly experienced Staff Software Engineer to lead our efforts in building, maintaining, and optimizing highly scalable, reliable, and secure systems. The Observability team is responsible for deploying and maintaining critical infrastructure at CoreWeave including our logging, tracing, and metrics platforms as well as the pipelines that feed them.
Responsibilities
- Lead and mentor engineers, fostering a culture of collaboration and continuous improvement.
- Scale logging, tracing, and metrics platforms to support a global datacenter footprint.
- Develop and refine monitoring and alerting to enhance system reliability.
- Advise engineers across CoreWeave on optimal usage of Observability systems.
- Automate interactions with CoreWeave's Compute Infrastructure layer.
- Manage production clusters and ensure development teams follow best practices for deployments.
- Required Qualifications:
- 7+ years of experience in Software Engineering, Site Reliability Engineering, DevOps, or a related field.
- Deep expertise across all observability pillars using tools like ClickHouse, Elastic, Loki, Victoria Metrics, Prometheus, Thanos and/or Grafana.
- Expertise in Kubernetes, containerization, and microservices architectures.
- Proven track record of leading incident management and post-mortem analysis.
- Excellent problem-solving, analytical, and communication skills.
Requirements
- Experience running and scaling observability tools as a cloud provider .
- Experience administering large-scale kubernetes clusters.
- Deep understanding of data-streaming systems.
- To fulfill our obligation to protect client data, successful applicants offered employment with CoreWeave will be required to complete a basic criminal record check, conducted in compliance with GDPR. Employment offers are conditional upon receiving satisfactory check results
Benefits
Additional Information
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com . We're proud to be a Living Wage accredited Employer.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at coreweaveu? Share your experience