Site Reliability Engineer (SRE)
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Operate and enhance Kubernetes platforms across AWS, Azure, and on-premise environments.
- Lead incident response, problem management, and root cause analysis activities.
- Deliver cluster lifecycle management including upgrades, patching, node pool management, CNI and CSI configuration, ingress management, and Rancher operations.
- Own observability strategy including dashboards, alerting, monitoring, and definition of SLOs and SLIs.
- Implement GitOps practices using Fleet and reduce operational toil through automation and governance.
- Apply secure API gateway and Web Application Firewall (WAF) patterns.
- Design and support distributed systems including event brokers and asynchronous messaging architectures.
- Maintain platform security posture including CVE remediation, GRC controls, and security scanning pipelines.
- Provision and manage infrastructure using Terraform and Crossplane as orchestration layers.
- Implement and maintain CI/CD pipelines using Concourse, GitHub Actions, and Azure DevOps.
- Ensure compliance with PCI DSS and GDPR security patterns.
- Deep expertise in Kubernetes, Rancher, GitOps, Linux, and cloud networking.
- Strong experience operating in hybrid cloud environments across AWS, Azure, and on-premise platforms.
- Strong automation and scripting skills in Python, Go, Bash, PowerShell, or .NET.
- Proven experience with Infrastructure as Code using Terraform and Crossplane.
- Experience implementing and managing observability tooling including Grafana, Prometheus, Jaeger or Tempo, CloudWatch, Loki, and OpenTelemetry.
- Strong understanding of API gateway and Web Application Firewall patterns.
- Experience working with distributed systems and event-driven architectures.
- Experience operating within regulated environments including PCI DSS and GDPR.
- Knowledge of service mesh technologies such as Istio or Kuma is desirable.
- AWS operational experience is advantageous.
- Experience within payments or other regulated industries is beneficial.
- All your information will be kept confidential according to EEO guidelines.
- Candidates must be legally authorized to live and work in the country where the position is based, without requiring employer sponsorship.
- HelloKindred is committed to fair, transparent, and inclusive hiring practices. We assess candidates based on skills, experience, and role-related requirements.
- We appreciate your interest in this opportunity. While we review every application carefully, only candidates selected for an interview will be contacted.
Benefits
Additional Information
Anticipated Contract End Date/Length: August 28, 2026 Work Set Up: Hybrid (must be eligible for BPSS) Our client in the Information Technology and Services industry is looking for a Site Reliability Engineer (SRE) to support and enhance a complex, multi-cloud Kubernetes platform environment. This role is focused on driving platform reliability, automation, observability, and security across AWS, Azure, and on-premise infrastructure. The successful candidate will play a key role in improving uptime, reducing operational toil through GitOps and automation, strengthening platform security posture, and enabling scalable onboarding of new tenants and workloads. This is a hands-on engineering role operating within regulated environments and modern cloud-native architectures.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Hellokindred? Share your experience