Senior Observability Engineer

External

Renttherunway · Galway, Ireland

Full-timeOn-site1d ago

CI/CDComplianceDocumentationIncident ResponseKubernetesMachine Learning

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

Rent the Runway (RTR) is transforming the way we get dressed by pioneering the world's first Closet in the Cloud. Founded in 2009, RTR has disrupted the $2.4 trillion fashion industry by inspiring women with a more joyful, sustainable and financially-savvy way to feel their best every day. As the ultimate destination for circular fashion, the brand now offers infinite points of access to its shared closet via a fully customizable subscription to fashion, one-time rental or ownership. RTR offers designer apparel and accessories from hundreds of brand partners and has built in-house proprietary technology and a one-of-a-kind reverse logistics operation. RTR has been named to CNBC's "Disruptor 50" five times in ten years, and has been placed on Fast Company's Most Innovative Companies list multiple times. Galway Office: Rent The Runway established its European Technology Hub in Galway in April 2019. Based in the historic Claddagh area of the city, the growing team in Galway tackles core technology challenges and influences the next generation of services critical to Rent The Runway's success and continued growth. The Galway office is Rent the Runway's first international office outside the US and enables the company to significantly expand its Software Engineering, Product Development, Machine Learning Engineering and Data Science footprint. Rent The Runway's Galway-based employees have the opportunity to grow their careers across several roles and career paths in Technology. Our engineering team is smart, pragmatic, and entrepreneurial. We practice continuous integration & test-driven development, engage in constant peer code reviews & pair programming, and work hard to give back to the software community through open-source contributions. We're looking for a Senior Observability Engineer to lead the development and scaling of telemetry systems that keep our platforms reliable, performant, and resilient. You'll play a critical role in shaping observability practices across engineering and infrastructure teams , enabling better incident response, deeper system insight, and stronger delivery outcomes. You will define how we measure, detect, and respond to what matters , empowering teams to build and operate services with confidence. As a senior contributor, you will own key initiatives, drive adoption of observability standards, and collaborate across the company to enable scalable, efficient engineering.

Responsibilities

Lead the architecture, delivery, and continuous improvement of observability solutions , leveraging tools like Splunk Observability Cloud, Google Cloud Observability or equivalent platforms.
Build scalable, automated telemetry pipelines using Terraform and modern Infrastructure-as-Code (IaC) workflows to support auditability, reliability, and self-service adoption.
Define and evolve best practices for metrics, traces, logs, and events to ensure high-signal alerting and consistent, actionable instrumentation across our systems.
Collaborate with application, platform, security, and compliance teams to integrate observability into every phase of the development lifecycle, from instrumentation patterns to SLOs to post-incident reviews.
Drive the definition and adoption of internal standards for service-level indicators (SLIs), error budgets, and system health across services.
Lead adoption of modern AI-assisted development and debugging workflows within observability tooling and automation, helping teams accelerate incident response, system instrumentation, and root cause analysis.
Identify and resolve cross-functional observability challenges through simplification, standardisation, and durable solution design.
Participate in SRE/Platform on-call rotations to stay connected to production challenges and continuously improve observability tooling, alert design, and incident response.
Provide technical guidance to engineers across teams, helping them apply telemetry principles to improve system insight.
Champion a culture of reliability through enablement , building reusable frameworks, shared documentation, and team-level training that improves observability without increasing toil.
About You:
5+ years of experience in SRE, DevOps, or platform engineering roles, with a deep specialisation in observability systems and telemetry design.
Recognised as a subject-matter expert in observability tools and practices, especially in cloud-native, distributed systems environments.
Proven track record of delivering high-impact technical projects that improve system visibility, reduce operational risk, or enhance incident response at scale.
Deep knowledge of metrics, logs, traces, and events , and how to apply them to both product services and infrastructure components.
Hands-on experience managing telemetry pipelines with Terraform , CI/CD tooling, and service instrumentation in Kubernetes-based environments.
Strong understanding of system-level design princip

Benefits

Health insurancePaid time off

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at renttherunway? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect