Skip to main content
Back to jobs

Sr Engineer Site Reliability

External
empower logoEmpower · Ka Bangalore
ContractOn-siteToday
ArgoCDAWSChaos EngineeringCI/CDComplianceDatadog
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Requirements

  • Required:
  • Bachelor's degree in Computer Science, Information Systems or similar emphasis, or equivalent experience
  • 4-7 years of experience in Site Reliability Engineering (or equivalent), with a track record of operating large-scale production systems
  • Deep expertise in AWS, with hands-on experience across a broad range of services and architectural patterns
  • Advanced Kubernetes knowledge, including custom resources, operators, and cluster federation concepts
  • Expert-level proficiency in Terraform, including module development, state management, and complex workflow orchestration
  • Strong programming skills in Python and/or Go, with ability to develop production-quality tools and services
  • Production experience implementing observability at scale using Datadog, Splunk, or similar platforms
  • Demonstrated experience establishing and maintaining CI/CD pipelines at enterprise scale
  • Deep understanding of GitOps principles and experience with tools like ArgoCD or Flux
  • Proven ability to lead complex incident response and conduct thorough postmortems
  • Strong understanding of networking, security, and infrastructure design patterns
  • Experience mentoring engineers and conducting technical training
  • Preferred:
  • Experience in financial services or payments industry
  • Deep knowledge of compliance frameworks (SOC 2, PCI DSS, FINRA)
  • AWS certifications (Solutions Architect Professional, DevOps Engineer Professional)
  • CKA and/or CKAD certifications
  • Experience with service mesh implementations (Istio, Linkerd, Consul)
  • Background in chaos engineering and fault injection testing
  • Experience with FinOps and cloud cost optimization
  • Contributions to open-source projects in the SRE/DevOps space
  • Experience implementing O

Benefits

Vision insuranceFlexible schedule

Additional Information

Our vision for the future is based on the idea that transforming financial lives starts by giving our people the freedom to transform their own. We have a flexible work environment, and fluid career paths. We not only encourage but celebrate internal mobility. We also recognize the importance of purpose, well-being, and work-life balance. Within Empower and our communities, we work hard to create a welcoming and inclusive environment, and our associates dedicate thousands of hours to volunteering for causes that matter most to them. Chart your own path and grow your career while helping more customers achieve financial freedom. Empower Yourself. As a Senior Site Reliability Engineer at Empower, you'll be a technical leader driving reliability initiatives across critical financial services infrastructure. You'll architect solutions for complex operational challenges, mentor engineers, and establish best practices that ensure our platform can scale to serve millions of customers with the reliability they expect from a Fortune 500 fintech company. ESSENTIAL FUNCTIONS: Technical Leadership: Design and implement highly available, fault-tolerant systems supporting critical financial transactions Architect infrastructure solutions using AWS best practices, optimizing for cost, performance, and reliability Lead complex incident response efforts, coordinating across teams to restore service rapidly Drive postmortem processes for high-severity incidents, ensuring meaningful action items are identified and completed Establish and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for key services Design and implement disaster recovery strategies and business continuity plans Operational Excellence: Build sophisticated Infrastructure as Code (IaC) solutions using Terraform, incorporating advanced patterns like modules, workspaces, and state management Architect and optimize multi-cluster EKS environments, implementing pod autoscaling, cluster autoscaling, and resource optimization Design observability strategies using Datadog and Splunk, creating meaningful metrics, dashboards, and alerting that enable proactive problem detection Implement progressive delivery mechanisms (canary deployments, blue-green deployments) within GitOps workflows Build automation frameworks that significantly reduce operational toil and improve team efficiency Collaboration & Influence: Partner with development teams to improve application reliability, conducting design reviews and providing architectural guidance Mentor and guide junior and intermediate SREs, conducting code reviews and providing technical coaching Contribute to architectural decisions that impact platform reliability and scalability Evangelize SRE best practices across the engineering organization Participate in on-call rotations and drive improvements to reduce on-call burden Compliance & Security: Implement and maintain zero-trust security controls across infrastructure Ensure systems meet financial services regulatory requirements and internal compliance standards Conduct security reviews of infrastructure changes and deployment processes Participate in audit preparations and respond to compliance-related inquiries


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at empower? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect