Skip to main content
Back to jobs

Principal Associate SRE

External
Capital One logoCapital One · Mexico City, Mexico
Full-timeHybridToday
AWSAzureBashCI/CDClassificationCompliance
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Build and maintain reliability tooling - observability dashboards, automated alerts, runbooks, and remediation scripts that reduce toil and improve mean time to recovery
  • Develop automation solutions - using Python, Java, and shell scripting to eliminate manual operational processes, from certificate rotation to compliance artifact generation
  • Troubleshoot and debug complex production issues - diagnose failures across distributed systems spanning on-prem data centers and AWS, identify root causes, and implement durable fixes
  • Contribute to observability - configure and tune monitoring in Datadog and Observe, build dashboards that surface actionable signals, and reduce unactionable alert volume
  • Support incident response - participate in on-call rotations, respond to production incidents, drive diagnosis, and contribute to blameless postmortems
  • Leverage AI tools to accelerate engineering - use agentic AI automation (Claude Code and others) to develop solutions, generate runbook drafts, and build automation agents
  • Manage secrets and certificates - automate rotation and provisioning, ensuring security posture without manual toil
  • Deliver through CI/CD pipelines - build, test, and deploy automation via continuous integration and API automation frameworks
  • What Success Looks Like
  • Independently troubleshooting and resolving production issues within your domain without escalation
  • At least one operational process fully automated and running in production
  • Contributing measurably to team OKRs - whether that's alert noise reduction, MTTR improvement, or settlement cycle reliability
  • Producing or improving runbooks and dashboards that your teammates and partner teams actively use
  • The Environment

Requirements

  • Professional English fluency
  • Bachelor's degree
  • Background in SRE, production operations, or reliability engineering
  • At least 4 years of experience in DevOps Engineering (internship experience does not apply)
  • 4+ years of experience in at least one of the following: Java, Python, Go
  • At least 2 years of experience with Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
  • 2+ years of experience with container orchestration services including Docker or Kubernetes
  • Experience with Shell or Bash scripting
  • At least 2 years of Unix or Linux system administration experience
  • Experience developing automation solutions using agentic AI tools (Claude Code, Copilot CLI)
  • Troubleshooting and debugging skills across distributed systems
  • Familiarity with payments, financial services, or other regulated high-availability domains
  • Knowledge or experience of Networking concepts (TCP/DNS/TLS)
  • At Capital One, we respect individual differences in culture, religion, and ethnicity. Likewise, we promote equal opportunities and development for all personnel. In the hiring process, we seek to provide equal employment opportunities to candidates,

Benefits

Vision insurance

Additional Information

WeWork Reforma Latino (97001), Mexico, Ciudad de Mexico, Ciudad de Mexico Principal Associate SRE We're building a Site Reliability Engineering center in Mexico City and hiring Principal Associate SREs to join one of our founding teams. You'll work on payment-critical systems across the Discover Network, Diners Club International, and PULSE - contributing to settlement reliability, alert quality, observability, and automation that directly impacts millions of transactions daily. This is a ground-floor opportunity. You'll be part of the first cohort of engineers in CDMX, working alongside experienced SRE leaders to build the operational muscle that allows Mexico City to own reliability outcomes independently. Depending on team placement, you'll focus on one of the following areas: Settlement - ensuring batch settlement cycles complete accurately, on time, and in compliance with regulatory requirements across domestic credit/debit and international cross-border networks Alert Signal & Observability - reducing alert noise, building automated severity classification, and creating customer impact dashboards that make incident response faster and more decisive Reliability Automation & Platform Convergence - building automated runbooks, driving Capital One platform adoption, and developing AI-powered remediation workflows


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Capital One? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect