Skip to main content
Back to jobs

Staff SRE, Ads

External
Reddit logoReddit · Remote
Full-timeRemoteToday
BigQueryKafkaKubernetesLeadershipLinuxMachine Learning
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Lead reliability initiatives across multiple Ads domains including ad serving, auctions, targeting, reporting, measurement, and billing.
  • Partner with engineering leadership to improve reliability, scalability, operational excellence, and engineering efficiency across the Ads organization.
  • Drive architecture reviews and influence technical decisions impacting critical revenue-generating systems.
  • Design and build platforms, tooling, and automation that improve reliability and developer productivity at scale.
  • Participate in on-call rotations, lead complex incident investigations and coordinate cross-functional response efforts during major production events.
  • Identify systemic reliability risks and drive long-term solutions that improve platform resilience.
  • Establish reliability metrics around advertiser-critical user journeys such as campaign creation, ad delivery, auction participation, reporting, attribution, and billing.
  • Mentor engineers and provide technical leadership across multiple teams.
  • Influence roadmap planning and ensure reliability considerations are incorporated into product and infrastructure investments.

Requirements

  • 8+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems.
  • Strong experience supporting high traffic, user facing production environments.
  • Deep understanding of distributed systems, networking, Linux systems, cloud native architectures.
  • Experience designing highly available systems with strong operational and reliability practices.
  • Strong understanding of observability systems including metrics, logging, tracing, and alerting.
  • Good programming skills in languages such as Go, Python, or similar.
  • Experience improving reliability through SLOs, automation, incident management, and performance optimization.
  • Demonstrated ability to troubleshoot complex issues across a modern distributed system stack.
  • Strong collaboration and communication skills with the ability to influence technical direction across teams.
  • Experience supporting advertising technology platforms or other large-scale revenue-critical systems.
  • Deep understanding of reliability challenges associated with ad-serving, real-time auctions, budget pacing, campaign delivery, measurement, attribution, or billing systems.
  • Experience operating high-QPS, low-latency services where latency directly impacts business outcomes.
  • Experience establishing reliability programs that deliver meaningful, measurable business outcomes
  • Experience with Kubernetes, cloud infrastructure, and large-scale distributed systems.
  • Familiarity with Kafka, ClickHouse, Spark, Flink, BigQuery, or similar large-scale data platforms.
  • Experience partnering with Product, Data Science, and Ads Engineering organizations.
  • Experience supporting machine learning inference or recommendation systems at scale.

Benefits

Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving supportFamily Planning SupportGender-Affirming CareMental Health & Coaching BenefitsPrivate Medical, Dental, and Vision BenefitsPersonal Retirement Savings Account with matching contributionCycle to Work and Tax Saver schemesFlexible Vacation & Paid Volunteer Time OffGenerous Paid Parental LeaveIn select roles and locations, the interviews will be recorded, transcribed and summarized by artificial intelligence (AI). You will have the opportunity to opt out of recording, transcription and summarization prior to any scheduled interviews.During the interview, we will collect the following categories of personal infoHealth insuranceDental insuranceVision insurancePaid time offRemote work optionsFlexible scheduleParental leave

Additional Information

Reddit is a community of communities. It's built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 126 million daily active unique visitors, Reddit is one of the internet's largest sources of information. For more information, visit www.redditinc.com . Location: Reddit has a flexible first workforce. Don't live near our office? No worries: you can work remotely from anywhere in the UK, the Netherlands or Ireland. The Ads organization powers Reddit's advertising platform, enabling advertisers to reach highly engaged communities while helping Reddit grow its business. The reliability of our Ads systems directly impacts advertiser success, revenue generation, and user experience. The Ads Reliability team partners closely with Ads Engineering teams to improve reliability, scalability, operational excellence, and developer productivity across Reddit's advertising ecosystem. We're looking for a Staff Site Reliability Engineer who will provide technical leadership for reliability initiatives across the Ads organization and help shape the future of Ads infrastructure at Reddit.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Reddit? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect