Site Reliability Engineer
One-Click ApplyWe'll track this in your applications and open the company's page so you can finish applying.
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
In this role, you will strengthen the SRE Platform team's mission by advancing the foundational platforms that automate manual workflows and elevate system reliability. Your work will ensure our staging environments remain stable and production-like, empowering QA and development teams to test, validate, and deploy their applications with confidence. You will also contribute to operational excellence through active participation in the weekly on-call rotation, supporting consistent and dependable infrastructure performance. Automate and optimize operational processes Enhance and maintain the observability stack Oversee test/staging environments management Develop and support critical production components Handle and resolve production incidents Participate in the on-call rotation Strong teamwork and collaboration skills Solid understanding of SRE concepts, including SLIs, SLOs, SLAs, and Error Budgets Proficiency in Python or another scripting language Strong grasp of software engineering principles Hands-on experience with observability and monitoring tools such as Prometheus and Grafana Familiarity with logging stacks (e.g., ELK, Loki) and tracing systems (e.g., Jaeger, Tempo) Understanding of RDBMS and Redis Experience working with Kubernetes and related tooling (e.g., Helm)
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Snapp? Share your experience
Interested in this role?
One tap and your profile goes straight to the employer.
We'll track this in your applications and open the company's page so you can finish applying.