Site Reliability Engineer (SRE)
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Monstro is building a secure, multi-tenant platform on Google Cloud, and we're hiring a Site Reliability Engineer to own the reliability and observability of that platform end-to-end. This is a hands-on role for someone who wants to do real SRE work - not a rebrand of L1 support. You'll write the dashboards, define the SLOs, build the automation that kills toil, and take your turn on the on-call rotation that proves it all works. When something breaks at 2 AM, you're the person who keeps it running; when nothing's breaking, you're the person making sure the next break is smaller, shorter, or doesn't happen at all
Responsibilities
- Observability and reliability engineering
- Define and maintain SLOs and SLIs for our tier-1 services: API gateway, application services, identity, and edge availability
- Build canonical dashboards and alerts in Google Cloud Monitoring, backed by structured logs and BigQuery log analytics
- Tune alert routing so every page is actionable - kill the rest
- Instrument services for distributed tracing and structured logging; push back on services that ship without it
- Own error budgets and use them to prioritize reliability work over feature work when burned
- Reduce toil: automate the top recurring page from the previous quarter
- Maintain runbooks so every page maps to one within a cycle of first occurrence
- On-call rotation and incident response
- First responder for production alerts across monitoring, API gateway, edge defense, and CI
- Triage severity, run the incident bridge, drive mitigation (revision rollback, traffic shift, scaling, edge block, credential rotation)
- Own internal and external incident comms during your shift
- Drive postmortems to closure with action items tracked as audit evidence
- Clean written handoffs at end of shift
- Our stack
- Google Cloud Platform across multiple environments
- Apigee X for API management
- Cloud Run, GKE Autopilot, Cloud SQL
- Identity Platform for customer identity
- Cloud Armor, Cloud IDS, Security Command Center for edge and posture
- BigQuery-backed log analytics from an org-level log sink
- OpenTofu / Terraform for everything; GitHub Actions for CI/CD
- Linear for work tracking
- What You Bring
- Required:
- Solid production experience on GCP (or comparable AWS/Azure depth with willingness to ramp on GCP fast)
- Comfortable on-call: you've run incidents, written postmortems, and shipped the action items
- Strong observability fundamentals: SLOs, log-based metrics, alert hygiene, dashboard discipline
- Working knowledge of Kubernetes, API gateways, identity systems, and at least one IaC tool
- Scripting / coding fluency (Python, Go, Bash) for automation and tooling
- Good written communication - handoffs, postmortems, and runbooks are part of the job
- Bias toward fixing the system, not the symptom
Requirements
- Apigee or another enterprise API gateway in production
- BigQuery for log analytics or audit
- Experience standing up observability from scratch, not just maintaining inherited dashboards
- SOC2 or similar compliance environments
- Why Join Us
- If you enjoy the puzzle of understanding complex environments, the satisfaction of a well-organised document, and the energy of working directly with clients, this is your role.
- Why Monstro
- Ownership & Impact: Shape the future of AI-powered finance - building a category-defining product used by consumers and institutions around the world.
- Experienced Team: Join a team with leadership that has a track record of scaling companies from early stage to major exits.
- Principles-Driven Culture : Work in a culture that values speed, ownership, and impact - what most companies achieve in 90 days, we do in 45.
- Competitive salary with potential: for expanded compensation and benefi
Benefits
Additional Information
About Monstro Monstro is the operating system for governed financial intelligence. We build governance and intelligence infrastructure that enables artificial intelligence to operate safely, explainably, and at institutional scale. We exist because the level of financial guidance historically available to a small group should be accessible to many more people. By combining AI with deep institutional infrastructure, we help financial institutions deliver more personalized, responsible, and life-changing financial support to millions of individuals. We're building mission-critical systems in a highly regulated domain, and we care deeply about doing it right. If you're motivated by meaningful problems, high standards, and shaping infrastructure that improves financial outcomes, you'll feel at home here.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at monstro? Share your experience