Senior Site Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
We're looking for a Senior Site Reliability Engineer who takes extreme ownership of production systems and thrives in a collaborative, fast-paced environment. You bring deep hands-on experience across infrastructure, data platforms, and delivery pipelines - and you hold yourself accountable for reliability outcomes.
Responsibilities
- Own reliability and uptime of production infrastructure including Kubernetes clusters and data platforms
- Lead Kubernetes version upgrades with zero-downtime strategies across large-scale environments
- Deep understanding of data pipelines and ensuring reliability, observability, and scalability end-to-end
- Manage and scale ELK stack for centralized logging and observability across services
- Operate and optimize ClickHouse clusters for high-throughput analytical workloads
- Administer Kafka clusters - tuning, scaling, and ensuring fault-tolerant message delivery
- Participate in and continuously improve on-call rotations, runbooks, and incident response processes
- Drive automation across infrastructure provisioning, deployments, and operational toil
- Collaborate closely with engineering, data, and product teams to embed reliability from day one
- Required Qualifications
- 8+ years of experience in SRE, DevOps, or infrastructure engineering
- Hands-on experience with Kubernetes including version upgrade planning, node pool migrations, and zero-downtime rollouts
- Strong experience with Kafka - operations, tuning, and scaling in production
- Proficiency with ELK stack (Elasticsearch, Logstash, Kibana) for log management and observability
- Experience operating ClickHouse or similar columnar databases at scale
- Solid background in building and maintaining data pipelines reliably in production
- Infrastructure as Code with Terraform - modules, state management, and multi-environment setups
- CI/CD expertise using Flux, Jenkins, or Spinnaker
- Strong Git practices - branching strategies, GitOps workflows
- Proficiency in at least one programming/scripting language (Python, Go, Bash)
- Proven on-call experience with a track record of improving alert quality and reducing MTTR
- Strong automation mindset - eliminate toil, build durable solutions
Requirements
- Extreme ownership - you don't wait to be asked, you drive problems to resolution
- Collaborative team player who lifts those around them
- Clear communicator across engineering and non-engineering stakeholders
Benefits
Additional Information
At SolarWinds, we're a people-first company. Our purpose is to enrich the lives of the people we serve-including our employees, customers, shareholders, partners, and communities. Join us in our mission to help customers accelerate business transformation with simple, powerful, and secure solutions. The ideal candidate thrives in an innovative, fast-paced environment and is collaborative, accountable, ready, and empathetic. We're looking for individuals who believe they can accomplish more as a team and create lasting growth for themselves and others. We hire based on attitude, competency, and commitment. Solarians are ready to advance our world-class solutions in a fast-paced environment and accept the challenge to lead with purpose. If you're looking to build your career with an exceptional team, you've come to the right place. Join SolarWinds and grow with us! We work in a hybrid mode 3+2 , with a minimum of 3 days at the office (with mandatory Tuesdays and Wednesdays) and a maximum of 2 days at the home office. The location of our office is Holandská 873/6, Brno - Štýřice, 639 00 . We employ only via an employment contract - full-time employment (HPP).
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at solarwinds? Share your experience