Sr Manager, Site Reliability Engineering
ExternalFull-timeOn-siteToday
Capacity PlanningComplianceIncident ResponseObservabilityPerformance OptimizationSite Reliability Engineering
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
We are building a high-performing Site Reliability Engineering (SRE) organization responsible for ensuring the availability, scalability, and resiliency of a large-scale proprietary Payments Processing Platform. The team is composed of experienced engineers with deep platform and payments expertise, focused on evolving from traditional production support to proactive reliability engineering. We partner closely with development, infrastructure, and business teams to deliver resilient systems, improve service performance, and enable continuous delivery at scale.
Responsibilities
- Lead and scale SRE teams responsible for the reliability, availability, and performance of critical payments platforms
- Define and drive Service Level Objectives (SLOs), SLIs, and error budgets to ensure measurable reliability outcomes
- Own incident management strategy, including major incident response, root cause analysis, and blameless postmortems
- Drive automation and toil reduction across operational processes to improve efficiency and scalability
- Partner with development teams to embed reliability engineering practices into the software development lifecycle
- Establish and standardize observability practices (monitoring, alerting, logging, tracing) across platforms
- Ensure platforms meet performance, scalability, and resiliency requirements aligned to business and regulatory needs
- Lead capacity planning and performance optimization initiatives across distributed systems
- Collaborate with product, infrastructure, and compliance teams to ensure secure, compliant, and highly available services
- Build and manage high-performing teams (managers and engineers), including hiring, performance management, and development capability
- Drive transformation from reactive support models to proactive reliability engineering culture
- Provide strategic input into platform modernization, architecture decisions, and reliability roadmaps
- Monitor execution across key reliability initiatives and value streams, ensuring alignment to business priorities
- What you bring
- Strong experience leading SRE, production engineering, or platform engineering teams
- Deep understanding of **distributed systems,
- Privacy Statement
- FIS is committed to protecting the privacy and security of all personal information that we process in order to provide services to our clients. For specific information on how FIS protects personal information online, please see the Online Privacy Notice .
- Sourcing Model
- #pridepass
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at fis? Share your experience