Staff Technical Program Manager, Site Reliability Engineering
ExternalFull-timeOn-site2w ago
AWSAzureComplianceIncident ResponseJiraKubernetes
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Drive Program Planning & Execution - Define program scope, milestones, and success criteria with SRE engineers and leaders. Manage dependencies across platform teams, keep work clearly tracked in Jira, and deliver on time
- Strengthen Production Reliability - Lead change management and launch readiness programs. Partner with SREs and product teams to define and operationalize SLOs/SLIs, and use incident data, metrics, and capacity signals to drive prioritization and continuous improvement
- Lead Cross-Functional Coordination - Align SRE with Security, Compliance, Cloud platform, and other engineering teams. Coordinate cross-team incident response, ensure clear follow-through, and build trust as the go-to driver of complex, multi-team efforts
- Build Scalable Systems & Processes - Design lightweight frameworks and communication patterns that help SRE deliver reliably at scale. Work yourself out of the "hero" role by leaving teams better-equipped to execute independently
Requirements
- 8+ years in technical program management, engineering management, or a comparable technical role partnering with software engineering teams
- Proven track record leading large-scale, cross-team platform initiatives through ambiguity and change
- Strong knowledge of production change management, software development lifecycle, and reliability metrics (SLOs, SLIs)
- Skilled at shaping roadmaps and managing dependencies
- Able to query and interpret metrics, logs, or other data sources to inform decisions and communicate risk
- Excellent communicator-clear, concise, and calm-across engineers, cross-functional partners, and executives
- Low-ego, highly collaborative, and motivated by ownership of hard problems end to end
- Hands-on or close-partner experience with Kubernetes, cloud networking, or observability stacks (metrics, logs, tracing, alerting)
- Prior experience working with or alongside SRE teams
- Background in large-scale cloud infrastructure or platform engineering
- Familiarity with MongoDB Atlas or other modern cloud database platforms
- Why This Role
- About MongoDB
- With offices worldwide and nearly 60,000 customers-including 75% of the Fortune 100 and AI-native startups-relying on MongoDB for their most important applications, we're powering the next era of software.
- Our compass at MongoDB is our Leadership Commitment, guiding how and why we make decisions, show up for each other, and win. It's what makes us MongoDB.
- MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request an accommodation due to a disability, please inform your recruiter.
- MongoDB is an equal opportunities employer.
- REQ ID: 1273416890
Benefits
Remote work optionsParental leave
Additional Information
As a TPM for SRE, you will partner with SRE leaders and engineers to scale the platform that underpins all of MongoDB's cloud products. You will drive program execution, strengthen production reliability practices, and coordinate cross-functional efforts across US and EMEA teams. Success in this role means smoother launches, clearer roadmaps, stronger reliability metrics and an SRE organization that's better-equipped to deliver predictability at scale. This role can be based out of our Dublin or Cork office or remotely in Ireland.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at MongoDB? Share your experience