Skip to main content
Back to jobs

Senior Manager, Site Reliability Engineering

External
tubitv logoTubitv · San Francisco
Full-timeHybrid1mo ago
Cross-functional CollaborationIncident ResponseLeadershipMachine LearningMoveObservability
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization that applies a developer's mindset and toolkit to the challenges of building and running large-scale, distributed systems. Our mission is to engineer resilience from the ground up, enabling our product teams to innovate rapidly while ensuring our users have a stellar experience. We own the availability, latency, performance, and capacity of our platform, and we achieve our goals through a culture of data-driven decision-making, blameless learning, and relentless automation. We are seeking an experienced and visionary Senior SRE Manager to lead and grow our newly built Site Reliability Engineering team. You are more than a people manager or a tech lead; you are the strategic leader responsible for architecting our reliability roadmap. You will build and mentor a team of talented engineers, foster a culture of blameless learning and continuous improvement, and champion the engineering practices that allow us to balance rapid innovation with rock-solid stability. You will be a key influencer in our engineering leadership, partnering with peers across the organization to ensure reliability is a shared responsibility and a core tenet of our engineering culture.

Responsibilities

  • Team Leadership & Mentorship:
  • Establish equitable, sustainable on-call practices (including global coverage where applicable) that protect focus time and avoid burnout.
  • Define team rituals - runbook reviews, game days, and incident retros - that reinforce quality and learning.
  • Operational Excellence & Incident Management:
  • Streamline and improve our existing processes and practices, and collaborate with other teams to enhance our production release standards by improving current processes.
  • Define and tune a 24×7 on-call rotation for low noise and fast response; act as executive escalation partner during major incidents.
  • Own disaster-recovery strategy (playbooks, failover drills, recovery simulations) and track SLO gaps with time-bound remediations.
  • Financial & Vendor Management: Own the SRE budget, tooling, and headcount. Manage relationships with key third-party vendors for our observability and SRE related AI platforms, work with infra lead and finance team for contract negotiations and ensure we derive maximum value from our investments.
  • Building the Business Case: Building the techno-eco

Benefits

Vision insurance

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at tubitv? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect