Senior Site Reliability Engineer I
ExternalFull-timeOn-site1mo ago30+ days old, may be filled
ArgoCDComplianceGrafanaHelmJavaKubernetes
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- You will be part of the Observability team within Axon's Site Reliability organization - a focused team responsible for Axon's metrics, logging, tracing, and alerting infrastructure across dozens of environments globally.
- Own and evolve Axon's distributed tracing infrastructure, including Jaeger and OpenTelemetry-based instrumentation, driving adoption across Axon's service-oriented architecture
- Build and operate Axon's log aggregation platform (Grafana Loki + Alloy), expanding use cases beyond Kubernetes event logs and reducing organizational dependency on expensive third-party log tooling (including Splunk)
- Maintain and improve Axon's metrics infrastructure (Cortex, Prometheus, Grafana) - the foundation for alerting, dashboards, and SLO tracking across all of Axon's environments
- Write internal tooling and automation that makes observability self-service: toolkit commands, agentic on-call helpers, runbook generation, and dashboard scaffolding
- Manage observability infrastructure as code via Terraform, CDK, ArgoCD, and Helm - including capacity management, cybersecurity requirements and compliance, and on-call rotation participation
- Work directly with engineering teams across Axon to define instrumentation standards, drive tracing adoption, and help teams build meaningful SLOs for their services
Requirements
- Bachelor's Degree in Computer Science, Engineering, or an equivalent highly technical field
- 7+ years of experience in SRE, platform engineering, or infrastructure engineering
- Strong Linux systems fundamentals and comfort working in Kubernetes-based environments
- Hands-on experience with one or more components of the LGTM stack: Loki, Grafana, Tempo/Jaeger, or Mimir/Cortex
- Experience with infrastructure as code - Terraform strongly preferred, CDK is a plus
- Experience with any of: Golang, Python, or Java
- United States citizen - able to gain CJIS clearance for full US production access
- Experience deploying and operating distributed tracing systems (OpenTelemetry, Jaeger, Tempo, or similar)
- Familiarity with OpenTelemetry - instrumentation, collectors, and pipelines
- Experience with GitOps workflows
- Exposure to 24/7 high-volume systems with formal SLA requirements
- Experience with agentic AI tooling or building LLM-powered developer tools
- Ability to debug complex multi-service distributed systems
- Benefits that Benefit You
- Competitive salary and 401k with employer match
- Discretionary time off
- Paid parental leave for all
- Medical, Dental, Vision plans
- Fitness Programs
- Emotional & Development Programs
- And yes, we have snacks in our offices
- Benefits listed herein may vary depending on the nature of your employment and the location where you work.
- Axon is a total compensation company, meanin
Benefits
Dental insuranceVision insurance401(k)Remote work optionsParental leave
Additional Information
Join Axon and be a Force for Good. At Axon, we're on a mission to Protect Life. We're explorers, pursuing society's most critical safety and justice issues with our ecosystem of devices and cloud software. Like our products, we work better together. We connect with candor and care, seeking out diverse perspectives from our customers, communities and each other. Life at Axon is fast-paced, challenging and meaningful. Here, you'll take ownership and drive real change. Constantly grow as you work hard for a mission that matters at a company where you matter.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at axon? Share your experience