Skip to main content
Back to jobs

Director, Site Reliability Engineering

External
doctolib logoDoctolib · Berlin, Germany
Full-timeOn-site2d ago
AWSCDNComplianceDatadogGCPGDPR
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Build and run a world-class SRE org of 25+ engineers across Cloud Infrastructure, Database & Storage, Network Infrastructure, Observability Tooling, and the Doctolib Operations Center
  • Own the infrastructure strategy and roadmap - cloud, database, network, observability - and deliver against company OKRs
  • Lead the Doctolib Operations Center: set incident response standards, drive MTTR reduction, embed blameless post-mortem culture across engineering
  • Architect and execute our multi-cloud strategy - reducing vendor lock-in, cutting migration costs, and enabling international expansion
  • Own network infrastructure at scale: load balancing, CDN/WAF, VPCs, peering, zero-trust networking across a high-traffic, multi-country platform
  • Drive observability as a product - give 700+ engineers true visibility into system health and turn observability maturity into an operational excellence lever
  • Lead from the front as a senior technical voice in the Platform org and broader Tech leadership team

Requirements

  • 12+ years in software engineering, including 5+ years leading managers and running infrastructure or SRE organisations at scale
  • Track record of taking SRE practices from reactive to proactive - with measurable reductions in incidents and MTTR
  • Strong multi-cloud and network infrastructure experience: load balancing, CDN/WAF, VPCs, peering, at high-traffic scale
  • Deep database operations background: large-scale transactional systems (PostgreSQL, Aurora), streaming/CDC (Kafka), data layer FinOps
  • Experience building observability platforms that give teams genuine visibility - metrics, logs, traces, alerting
  • Sharp process thinking: SLOs, error budgets, incident management, blameless post-mortems
  • Outcome-driven: you track reliability, cost efficiency, and engineering velocity as business metrics, not just technical ones
  • Strong communicator and influencer at executive level - equally credible with senior engineers and business stakeholders
  • Builder of high-performing, people-first engineering cultures
  • Fluent in English; comfortable in fast-paced, international environments
  • You recognise yourself in our playbook values
  • Bonus Points If You Have...
  • Experience in healthcare, regulated, or high-compliance industries (HDS, ISO 27001, SOC2, GDPR, data sovereignty)
  • Familiarity with our stack: Ruby on Rails, Node.js, Go, Python, React, AWS, GCP, Kubernetes, PostgreSQL, Datadog, GitHub Actions
  • French language proficiency
  • Experience with AI-augmented infrastructure tooling or ML platform operations
  • M&A or post-acquisition infrastructure integration experience

Benefits

A Deutschlandticket (Germany-wide public transport pass) fully paid for by Doctolib28 vacation days + 1 additional day for each full calendar year of employment (up to a maximum of 30 days)Work from abroad for up to 10 days per year thanks to our flexibility days policyCompany health insurance with great supplementary benefits through our partner AllianzCompany pension scheme (bAV) through Allianz with an employer subsidyof 40% (15% within the probationary period)Enrollment in Doctolib's long-term employee value sharing plan called DoctoGrowthThe Doctolib Parent Care program, which includes one month additional parental leaveHealth insurancePaid time offPerformance bonusParental leave

Additional Information

Why this role As our Director of Site Reliability Engineering , reporting to our VP of Platform Engineering, you'll own the core infrastructure layers that everything at Doctolib runs on: cloud infrastructure, database operations, network infrastructure, and observability . You will also lead the Doctolib Operations Center (DOC) and drive a decisive shift from reactive operations to a proactive, world-class reliability culture. This is a rare opportunity to shape the infrastructure backbone of Europe's leading healthtech company, at a moment when Doctolib is actively expanding multi-cloud capabilities, scaling to new countries , and building the reliability culture that will define the next decade of healthcare innovation. Why this is an extraordinary challenge Real stakes, every day. When Doctolib is down, consultations don't happen, diagnoses are delayed, care journeys are interrupted. The infrastructure you build is a direct lever on patient outcomes - in a world where 8 of the top 10 causes of death in Europe are preventable. A once-in-a-generation platform transition. Multi-cloud, monolith modularisation, international expansion - all happening simultaneously. You won't inherit a finished platform. You'll define what it becomes. Reliability as the competitive moat. As we scale AI health companions, automate clinical workflows, and launch across Europe, the speed and resilience of the platform directly determines how fast 700+ engineers can ship innovations that change healthcare. A cultural build, not just a technical one. The incident response culture, observability standards, and operational ownership model you establish here will shape how Doctolib engineers work for years to come.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at doctolib? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect