Staff Engineer, Site Reliability
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Babylist is the leading platform for expecting and new families. More than 10 million people shop with Babylist every year, making it the go-to destination for seamless purchasing, guidance, and expert recomme
Requirements
- Deep hands-on Terraform expertise - you own IaC, not just contribute to it
- Proven AWS experience at scale - EKS, RDS, cloud networking, DNS, CDNs, load balancers - you know the gotchas
- Experienced operating Kubernetes in production - you've debugged the hard stuff, not just deployed the easy stuff
- Comfortable designing and improving CI/CD systems - CircleCI, GitHub Actions, or similar; you care about developer velocity, not just pipeline uptime
- Strong observability instincts - Datadog, Sentry, PagerDuty, Cronitor - you build alerting that's actionable, not noisy
- Experienced with on-call and incident management - you've run the post-mortems and actually changed things afterward
- Comfortable supporting developers across local, staging, and production - you're a resource, not a gatekeeper
- You naturally reach for AI in your work - at Babylist, every team uses AI daily. You're already using it to move faster and improve your output, and you stay curious about what's coming next.
- How You Will Make An Impact
- Infrastructure ownership - manage and evolve our AWS environment using Terraform, keeping EKS clusters, databases, and core services current and performant
- CI/CD reliability - own the speed and reliability of our CI systems for the full Engineering org - every deploy starts here
- Developer support - be the person engineers turn to when environments break; unblock them fast across local, staging, and production
- Monitoring & alerting standards - establish and socialize best practices so the right people get paged for the right reasons
- Incident response - lead or support incident response, drive post-incident reviews, and close the loop so the same thing doesn't happen twice
- Platform strategy - contribute to architectural decisions that shape how Babylist's infrastructure evolves over the next several years
- Why This Role
- Platform is the team every engineering team depends on - your work has outsized leverage across the entire product org, not just one area
- The infrastructure is solid but actively evolving - you're not inheriting chaos, you're shaping what comes next
- This is a staff-level role with real cross-team visibility - you'll influence how Babylist engineers build and ship, not just keep the lights on
- You'll work on systems that support millions of families at a high-stakes life moment - the scale is real and the product context makes the reliability work matter
Benefits
Additional Information
How We Build Babylist is in the middle of a fundamental shift in how software gets made, and we are not tiptoeing into it. We are rebuilding our engineering culture around a simple belief: AI changes everything. How teams are structured, how decisions get made, how fast ideas become working software. Our engineers own problems end to end, working directly with product, design, and business partners with short feedback loops and real stakeholder access. We ship, learn, and iterate fast. When something is not working, we throw it out and start over - project failure and personal failure are not the same thing here. AI tools are as natural to our workflow as an IDE or version control. We are not exploring this, we are living it. Our engineers use AI to explore tradeoffs, pressure-test designs, and move from problem to solution in hours instead of days. They generate code with AI so they can stay focused on the decisions that actually require human judgment - not the routine ones. More velocity means more time for craft: better test coverage, stronger architecture, and deeper customer understanding. We hold ourselves to a higher quality bar because of AI, not in spite of it. We are building this playbook in real time, and we are looking for people who want to build it with us. If you have already changed how you work because of AI - or you are ready to - and you care more about shipping something great than following a prescribed process, we should talk. Our Tech Stack Ruby on Rails AWS Sidekiq MySQL Redis What the Role Is Babylist's Platform team is the foundation every engineering team builds on - and this role is at the center of keeping it reliable, fast, and scalable. As a Staff SRE, you'll own the infrastructure and reliability practices that support 9 million+ users and the engineers who build for them. Babylist started as an e-commerce and registry platform, and we're actively growing beyond that - into health, media, mobile, and new product surfaces that don't exist yet. The Platform team is the foundation that makes all of it possible. This isn't a maintenance role - you'll be actively evolving how we build and operate AWS infrastructure, CI systems, and developer tooling. You'll work cross-functionally across all of Babylist Engineering, which means your decisions have wide leverage.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at babylist? Share your experience