Senior Platform Engineer II, Compute Services
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Champion reliability initiatives for Kubernetes application deployments: Advocate for best practices to ensure high availability, scalability, and resilience of applications in Kubernetes, focusing on robust testing, secure pipelines, and efficient resource use. Administer multi-tenant Kubernetes platforms: Manage complex multi-tenant Kubernetes clusters, configuring access, quotas, and security for isolation and optimal resource allocation while upholding SLAs. Perform lifecycle and day 2 operations on clusters: Execute Kubernetes cluster lifecycle, including provisioning, patching, monitoring, backup, disaster recovery, and troubleshooting. Deep dive into reliability issues: Conduct in-depth analysis and root cause identification for complex reliability incidents in Kubernetes, utilizing advanced debugging and monitoring tools to propose preventative measures. Perform on-call duties: Respond to critical alerts and incidents outside business hours, providing timely resolution to minimize disruptions, collaborating with teams, and communicating clearly.
Requirements
- Bachelor's in CS, Engineering, or related field, or equivalent experience preferred.
- CKA or similar certifications is highly desired.
- 5+ years administering multi-tenant SAAS Kubernetes (EKS, AKS, GKS).
- Strong Gitops/Devops with Argocd or similar helm chart management.
- Proven Docker and containerization experience.
- Strong Linux OS experience.
- Proficient in Go.
- Excellent problem-solving, debugging, and analytical skills.
- Strong communication and collaboration.
- Preferred:
- Master's degree in Computer Science, Engineering, or a related field.
- Experience with performance profiling and optimization of distributed systems.
- Knowledge of network protocols and distributed consensus algorithms.
- Wondering if you're a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams - even if you aren't a 100% skill or experience match.
- Why CoreWeave?
- Be Curious at Your Core
- Act Like an Owner
- Empower Employees
- Deliver Best-in-Class Client Experiences
- Achieve More Together
Benefits
Additional Information
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com .
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at CoreWeave? Share your experience