Sr. Manager SRE (Individual Contributor)

External

Capital One · Mexico City, Mexico

Full-timeHybridToday

AWSChaos EngineeringCI/CDClassificationComplianceDatadog

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Define and maintain a 12-18 month technical vision and roadmap for GPN SRE in Mexico City - decompose destination architecture into deliverable steps, sequence investments, and align execution across teams
Drive reliability transformation across settlement, observability, and automation domains - establish SLOs, error budgets, severity frameworks, and operational standards that teams build against
Pioneer AI and agentic automation approaches - design and build AI-driven solutions (using Claude Code, Copilot CLI, and LLM frameworks) for alert classification, runbook generation, automated remediation, and incident analysis; set patterns that other engineers extend
Own the technical strategy for domain-specific knowledge ramp-up: identify which domain expertise requires deep engineering investment vs. documentation, and architect systems that reduce reliance on tribal knowledge
Lead cross-team technical initiatives - drive observability platform convergence, standardize on COF tooling, and eliminate arbitrary uniqueness across towers
Serve as the senior escalation point for complex production incidents - diagnose cascading failures across distributed systems (storage, network, application), drive resolution, and ensure durable fixes land
Architect automation for high-risk operational processes - certificate rotation, compliance artifact generation, settlement cycle validation - ensuring security and reliability are built in from design
Mentor and elevate engineers across teams - conduct design reviews, establish engineering standards, coach on debugging and system thinking, and create an environment where Principal Associates and Managers grow into domain experts
Introduce and advocate for engineering practices that raise the bar - AI engineering, innersourcing, reuse over rebuild, open source contribution, blameless postmortems, and chaos engineering
Influence beyond the CDMX site - partner with US and UK leadership on architectural decisions, represent CDMX engineering in cross-org forums, and shape GPN-wide reliability strategy
What Success Looks Like
Technical roadmap established and executing - teams are delivering against a clear, sequenced plan with measurable reliability OKRs
At least one domain (alert signal reduction or settlement automation) where CDMX operates autonomously without US/UK escalation, driven by systems and patterns you architected
AI-powered automation deployed in production - incident classification models, generated runbooks, or automated remediation that demonstrably reduces MTTR or toil
Engineering standards and patterns documented and adopted - design review process, observability standards, incident response framework, and automation patterns that scale with the team
Recognized as the technical authority for GPN SRE reliability - sought out across towers and geographies for architectural guidance, incident escalation, and strategic input
Multiple engineers grown through your mentorship - visible skill development in system design, debugging, and operational judgment across the CDMX teams
The Environment

Benefits

Vision insurance

Additional Information

WeWork Reforma Latino (97001), Mexico, Ciudad de Mexico, Ciudad de Mexico Sr. Manager SRE (Individual Contributor) We're building a Site Reliability Engineering center in Mexico City, and we're hiring a Senior Manager-level SRE to serve as the technical anchor for the site - defining the reliability vision, driving cross-team execution, and pioneering automation and AI-driven approaches that transform how we operate three payment networks at scale. This is a strategic technical leadership role. You won't manage people directly, but you'll shape how multiple teams work - setting architectural direction for observability, automation, and operational excellence, alert signal reduction, and reliability platform convergence. You'll be the most senior IC engineer in Mexico City, partnering with the Director (people leader) to translate organizational goals into technical roadmaps and ensuring the engineering quality bar stays high as the site scales. You'll operate across the full landscape: batch settlement systems processing every domestic and international credit/debit transaction, real-time observability platforms that must detect failures before customers do, and AI-powered automation that eliminates the toil standing between us and a proactive reliability culture.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Capital One? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect