Staff SRE, Ads
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Lead reliability initiatives across multiple Ads domains including ad serving, auctions, targeting, reporting, measurement, and billing.
- Partner with engineering leadership to improve reliability, scalability, operational excellence, and engineering efficiency across the Ads organization.
- Drive architecture reviews and influence technical decisions impacting critical revenue-generating systems.
- Design and build platforms, tooling, and automation that improve reliability and developer productivity at scale.
- Participate in on-call rotations, lead complex incident investigations and coordinate cross-functional response efforts during major production events.
- Identify systemic reliability risks and drive long-term solutions that improve platform resilience.
- Establish reliability metrics around advertiser-critical user journeys such as campaign creation, ad delivery, auction participation, reporting, attribution, and billing.
- Mentor engineers and provide technical leadership across multiple teams.
- Influence roadmap planning and ensure reliability considerations are incorporated into product and infrastructure investments.
Requirements
- 8+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems.
- Strong experience supporting high traffic, user facing production environments.
- Deep understanding of distributed systems, networking, Linux systems, cloud native architectures.
- Experience designing highly available systems with strong operational and reliability practices.
- Strong understanding of observability systems including metrics, logging, tracing, and alerting.
- Good programming skills in languages such as Go, Python, or similar.
- Experience improving reliability through SLOs, automation, incident management, and performance optimization.
- Demonstrated ability to troubleshoot complex issues across a modern distributed system stack.
- Strong collaboration and communication skills with the ability to influence technical direction across teams.
- Experience supporting advertising technology platforms or other large-scale revenue-critical systems.
- Deep understanding of reliability challenges associated with ad-serving, real-time auctions, budget pacing, campaign delivery, measurement, attribution, or billing systems.
- Experience operating high-QPS, low-latency services where latency directly impacts business outcomes.
- Experience establishing reliability programs that deliver meaningful, measurable business outcomes
- Experience with Kubernetes, cloud infrastructure, and large-scale distributed systems.
- Familiarity with Kafka, ClickHouse, Spark, Flink, BigQuery, or similar large-scale data platforms.
- Experience partnering with Product, Data Science, and Ads Engineering organizations.
- Experience supporting machine learning inference or recommendation systems at scale.
Benefits
Additional Information
Reddit is a community of communities. It's built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 126 million daily active unique visitors, Reddit is one of the internet's largest sources of information. For more information, visit www.redditinc.com . Location: Reddit has a flexible first workforce. Don't live near our office? No worries: you can work remotely from anywhere in the UK, the Netherlands or Ireland. The Ads organization powers Reddit's advertising platform, enabling advertisers to reach highly engaged communities while helping Reddit grow its business. The reliability of our Ads systems directly impacts advertiser success, revenue generation, and user experience. The Ads Reliability team partners closely with Ads Engineering teams to improve reliability, scalability, operational excellence, and developer productivity across Reddit's advertising ecosystem. We're looking for a Staff Site Reliability Engineer who will provide technical leadership for reliability initiatives across the Ads organization and help shape the future of Ads infrastructure at Reddit.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Reddit? Share your experience