Senior Production Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
The SRE team owns reliability and infrastructure for Anduril's cloud deployments. We operate Kubernetes clusters, Terraform infrastructure, and observability platforms across 10+ production environments supporting active defense contracts. When platform services break under real operational load, we're the team that fixes them - often at the code level, not just the config level. We are looking for a Senior Production Engineer to join our team in Costa Mesa, CA (or DC) . In this role, you will be responsible for diagnosing and fixing stability vulnerabilities in core platform services that cause cascading failures in multi-tenant cloud deployments. You will write production Go to implement resilience patterns - leader election, circuit breakers, failure domain isolation - directly in service code. This will require deep experience with distributed systems, debugging complex failure modes across service boundaries, and writing production-quality Go. If you are someone who thrives on fixing hard reliability problems in live systems rather than building greenfield, this role is for you.