Staff Site Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Nubank was founded in 2013 to free people from a bureaucratic, slow and inefficient financial system. Since then, through innovative technology and outstanding customer service, the company has been redefining people's relationships with money across Latin America. With operations in Brazil, Mexico, and Colombia, Nubank is today one of the largest digital banking platforms and technology-leading companies in the world. Nubank was listed on the NYSE in 2021 and was recently named one of Time Magazine's 100 Most Influential Companies. For more information, visit www.nu.com.br. As Nubank expands globally, maintaining high operational maturity is critical to our mission of empowering customers. This role exists to bridge the gap between rapid scaling and bulletproof resilience. You will be the technical North Star for our Resilience area. You will act as a force multiplier by partnering with product squads to scale reliability best practices and design safe deployment guardrails. Your primary impact will come from solving complex architectural challenges, mentoring engineers, and shaping the long-term resilience strategy for one of the world's largest digital banks.
Responsibilities
- Define Technical Strategy: Lead the long-term roadmap for reliability and resilience to align with global expansion and regulatory requirements.
- Drive Resilience Engineering: Execute Chaos Engineering experiments and Disaster Recovery simulations to mitigate systemic vulnerabilities before they impact customers.
- Establish Operational Guardrails: Implement robust SLOs and SLIs across the organization to help product teams balance innovation speed with system stability.
- Enable Engineering Teams: Provide product squads with the training and architectural patterns necessary to improve their independent operational excellence.
- Required Qualifications
- Expertise in Cloud Resilience: Track record of architecting and maintaining high-availability systems in public cloud environments, preferably AWS.
- Incident Lifecycle Mastery: Deep experience in advanced root cause analysis and creating feedback loops that prevent incident recurrence.
- Observability Proficiency: Hands-on experience defining and implementing SLOs, SLIs, and error budgets in distributed microservices architectures.
- Chaos Engineering Practice: Real-world experience implementing Chaos Engineering and Disaster Recovery planning in production-scale environments.
Requirements
- Experience setting technical direction and coordinating large-scale projects across multiple teams.
- Background in Site Reliability Engineering (SRE)
- Familiarity with multi-region or multi-cell architecture patterns.
Benefits
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Nubank? Share your experience