Senior Director, Cloud and Site Reliability Engineering

External

Tricentis · Czech Republic

Full-timeHybrid2w ago

AWSAzureChaos EngineeringCI/CDComplianceGCP

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Cloud Strategy & Infrastructure Leadership
Define and execute the cloud infrastructure roadmap to support Tricentis' SaaS platform growth, reliability, and scalability goals across AWS, Azure, and GCP.
Establish cloud architecture standards and best practices including multi-cloud, hybrid-cloud, and cloud-native strategies.
Drive infrastructure cost optimization and efficiency, partnering with Finance and Engineering leadership to align cloud spending with business outcomes.
Lead the adoption of modern cloud technologies and emerging capabilities (AI and Agentic) to advance platform capabilities.
Collaborate with peer Engineering and Product leaders to align cloud and infrastructure initiatives with product roadmap and business goals.
Site Reliability Engineering & Operational Excellence
Build and mature the SRE function defining SLOs, SLIs, and error budgets that reflect customer expectations and business commitments.
Enhance operational effectiveness through the deployment and use of agentic capabilities to scale the team to meet enhance performance and reliability of our SaaS products.
Own the incident management and on-call strategy to establish effective processes for detection, response, remediation, and post-incident review improving MTTR.
Champion a culture of reliability embedding SRE principles across the broader Engineering organization to reduce toil and improve system resilience. Drive automation across infrastructure provisioning, monitoring, observability, and self-healing systems.
Partner with Security to ensure cloud environments meet compliance (SOC 2, ISO 27001, ISO 42001, GDPR, FedRAMP, and others as required).
Engineering Execution & Delivery
Work with Engineering teams to influence infrastructure design earlier in the agentic development process, as a first-party concern design constraint through AI skills and agents.
Oversee infrastructure delivery and operational readiness for all product releases, ensuring systems are observable, scalable, and fault tolerant.
Drive continuous improvement in CI/CD pipelines, deployment processes, and DevOps tooling in partnership with product engineering teams.
Establish and enforce infrastructure-as-code practices (Terraform, Pulumi, or equivalent) to increase consistency and reduce operational risk.
Define and track key reliability, performance, and availability of metrics, reporting regularly to senior leadership on platform health.

Requirements

10+ years of experience in cloud infrastructure, DevOps, or Site Reliability Engineering , with at least 5 years in senior engineering leadership roles.
Proven track record leading Cloud or SRE organizations at scale within SaaS or enterprise software companies.
Deep expertise in major cloud platforms (AWS, Azure, and/or GCP) including computer, networking, storage, security, and managed services.
Strong background in SRE principles, including SLO/SLI/error budget frameworks, observability, chaos engineering, and incident management.
Hands-on experience with Kubernetes, Terraform, CI/CD tooling, and modern infrastructure-as-code practices.
Experience with compliance frameworks (SOC 2, ISO 27001, FedRAMP, GDPR) and operating in regulated environments.
Excellent communication and influencing skills , with the ability to translate complex technical concepts into clear business impact.
What you can look forward to:
Flexible working schedule (no core hours)
Hybrid work environment (3 days in the office)
25 days of paid time off
2 days of paid Volunteering Leave per year to get involved in your local community or in a cause that matters to you
Meal vouchers
Pension Contribution
Profit sharing and holiday bonus
A team of passionate professionals who are experts in their fields
Events for employees to learn, celebrate and socialize (training sessions, hackathons, parties, sports events, board game gatherings, BBQs) and much more
Tricentis Core Values:
Knowing what we need to achieve and how to achieve it is important. Tricentis core values define our ways of working and the behaviors we model that create an enjoyable and successful Tricentis life.
Demonstrate Self-Awareness: Own your strengths and limitations.
Finish What We Start: Do what we say we are going to do.
Move Fast: Create momentum and efficiency.
Run Towards Change: Challenge the status quo.
Serve Our Customers & Communities: Create a positive experience with each interaction.
Solve Problems Together: We win or lose a

Benefits

Health insuranceVision insuranceFlexible schedulePerformance bonus

Additional Information

We are looking for an experienced and strategic leader to build and scale our Cloud and Site Reliability Engineering (SRE) organization. You will define and drive the cloud infrastructure strategy and operational excellence that underpins Tricentis' SaaS platform, ensuring the highest levels of availability, reliability, and performance. You will lead a team of talented Cloud Engineers and SREs, fostering a culture of excellence, automation-first thinking, and continuous improvement.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at tricentis? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect