Senior Site Reliability Engineer

External

Roche · Sant Cugat Del Vallès

Full-timeOn-siteToday

AWSAzureCapacity PlanningCI/CDKubernetesLeadership

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

At Roche, we are passionate about transforming patients' lives, and we are bold in both decision and action - we believe that good business means a better world. That is why we come to work every single day. We commit ourselves to scientific rigor, unassailable ethics and access to medical innovations for all. We do this today to build a better tomorrow. Roche is strongly committed to a diverse and inclusive workplace. We strive to build teams that represent a range of backgrounds, perspectives and skills. Embracing diversity enables us to create a great place to work and to innovate for patients. Step into the Future of IT with Roche! As a seasoned Site Reliability Engineer (SRE) at Roche, you will leverage your deep software engineering expertise to propel our products to new heights of robustness, scalability and reliability. This isn't just a role-it's an invitation to shape the backbone of technological innovations forward. Your Mission Design and maintain cutting-edge tools, scripts and frameworks that automate repetitive tasks, streamline software deployment and manage expansive systems with unparalleled efficiency. Partner closely with forward-thinking development teams to architect and implement high-performance solutions that elevate system efficiency, optimize resource utilization and enhance deployment processes for superior uptime and user satisfaction.

Responsibilities

Lead the charge in incident management and response. Detect system anomalies, troubleshoot swiftly and conduct thorough root cause analyses to prevent recurring issues.
By joining our elite team, you will play a pivotal role in delivering seamless experiences to our end-users, exceeding business and customer demands, and solidifying Roche's reputation as a leader in IT innovation.
Your Core Responsibilities
Reliability Engineering & Architecture
Define and implement SLIs, SLOs, and error budgets with product and engineering teams
Conduct reliability reviews for new and existing services
Design scalable, fault-tolerant architectures in AWS and Azure environments
Lead capacity planning, performance and cost optimization initiatives
Improve system resilience through automation and self-healing patterns
Drive organizational observability maturity (metrics, logs, traces, alert quality)
Incident Management & Continuous Improvement
Perform complex root cause analysis and drive rapid mitigation
Participate in blameless postmortems and follow-through
Improve MTTR, reduce incident frequency, and elevate production standards
Collaborate seamlessly with engineering teams to enable timely and effective resolutions
Handle requests and incidents, create and maintain runbooks
Participation in a structured 24*7 on-call rotation
Automation & Platform Engineering
Reduce operational toil through tooling and automation (Python or similar)
Improve CI/CD reliability and deployment safety mechanisms
Build and maintain infrastructure-as-code (Terraform or equivalent)
Enhance Kubernetes platform reliability (EKS, AKS, or similar)
Cross-Functional Leadership
Partner with business, engineering, security, and cloud teams to embed reliability early in the software development life cycle
Mentor mid-level engineers and help shape SRE best practices
Championing a culture of ownership, accountability, and continuous improvement

Requirements

Minimum bachelor's degree in computer science, Engineering, or a related field, or equivalent professional experience.
Experience in either site reliability engineering, software engineering or related fields with production on-call experience.
Solid experience with AWS and/or

Benefits

Health insurance

Additional Information

At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections, where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters. The Position The Position We are building a global Site Reliability Engineering (SRE) team to support critical commercial and internal platforms and applications. As an SRE, you will help design, build, and scale reliable distributed systems that power healthcare innovation worldwide. This role is focused on reliability, scalability, automation and operational excellence. You will influence system design, define reliability standards and reduce operational toil through engineering solutions. This role includes participation in a structured on-call rotation.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Roche? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect