Senior Software Engineer - Reliability

External

Nubank · Miami

Full-timeOn-site1w ago

ComplianceIncident ResponseObservabilitySAFeSystem Design

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

Nu is one of the largest digital financial platforms in the world, with more than 127 million customers across Brazil, Mexico, and Colombia. Guided by our mission to fight complexity and empower people, we are redefining financial services in Latin America and this is still just the beginning of the purple future we're building. Listed on the New York Stock Exchange (NYSE: NU), we combine proprietary technology, data intelligence, and an efficient operating model to deliver financial products that are simple, accessible, and human. Our impact has been recognized by global rankings such as Time 100 Companies, Fast Company's Most Innovative Companies, and Forbes World's Best Bank. Visit our institutional page https://international.nubank.com.br/careers/ The U.S. Market team is launching a differentiated financial product in the largest and most demanding financial market in the world. We're iterating quickly on real customer signals while building systems that will eventually serve customers at Nubank scale. That combination - early-stage velocity, regulatory weight, and high reliability expectations, requires an engineer whose primary mandate is reliability, scale, and operational excellence. This role exists to make sure the systems we're building today can be trusted in production tomorrow, and to set the bar for what "production-ready" means on this team. The engineer in this role delivers their mandate by writing production code, shaping architecture, and engineering the systems themselves - not by absorbing operational load. You'll be responsible for Define and operate against SLOs. Establish meaningful SLIs and SLOs with product and engineering partners, manage error budgets, and use them as real inputs to prioritization rather than dashboards no one reads. Build the observability layer. Improve metrics, logs, traces, and alerting so issues are detected early, attributed precisely, and debugged with code-level confidence. Push instrumentation upstream into the services we own. Lead incident response. Act as incident commander when needed, drive blameless postmortems, and turn findings into concrete engineering work that lands. Build the muscle in the team so this isn't centralized in any one person. Reduce toil through engineering. Identify repetitive operational work and eliminate it with software - automation, self-healing behavior, better defaults, better tooling - rather than absorbing it as ongoing overhead. Production Hardening. Stress-test designs for partial failure, dependency degradation, traffic spikes, and adversarial inputs. Run capacity and performance work before incidents arise. Ensure resiliency primitives are tuned and working correctly. Make change safe and fast. Improve release safety through progressive delivery, feature flags, canaries, rollbacks, and tested migrations. Help the squad ship faster and with lower blast radius. Improve developer experience especially where it removes operational friction or improves change safety. Where internal tooling or platform gaps slow the team down, build or contribute the fix. Prefer leverage over heroics. Partner across disciplines. Work closely with product, platform, security, compliance, and other engineering teams. Translate reliability and risk tradeoffs into language each audience can act on. Raise the engineering bar. Mentor engineers, review hard designs and PRs, and shape technical standards across the squad. Lead through clarity and judgment, not authority. We are looking for a person who has Track record of owning services in production - not just shipping them, but being the engineer responsible for how they behave under real load and real failure. Experience defining and operating against SLOs/SLIs, and using error budgets to influence engineering and product decisions. Experience leading incident response and writing postmortems that produced durable improvements. Hands-on experience with observability tooling (metrics, structured logging, distributed tracing) and using it to diagnose nontrivial production issues. Deep system design experience: distributed se

Benefits

Equity / stock options

Additional Information

About Nu Nu is one of the largest digital financial services platforms in the world, serving 135 million customers across Brazil, Mexico, and Colombia. The company has been leading an industry transformation by leveraging data and proprietary technology to develop innovative products and services. Guided by its mission to fight complexity and empower people, Nu caters to customers' complete financial journey, promoting financial access and advancement with responsible lending and transparency. The company is powered by an efficient and scalable business model that combines low cost to serve with growing returns. Nu's impact has been recognized in multiple awards, including Time 100 Most Influential Companies, Fast Company's Most Innovative Companies, and Forbes World's Best Banks. Visit our institutional page: https://www.nu.com/2026-en

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Nubank? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect