Staff Site Reliability Engineer

External

Transunion · Chicago, IL

Full-timeHybridToday

BashCapacity PlanningCI/CDCross-functional CollaborationDatadogDNS

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Benefits

Health insuranceDental insuranceVision insurance

Additional Information

TransUnion's Job Applicant Privacy Notice Personal Information We Collect Your Privacy Choices Team Overview At TransUnion, this role will report to a DevOps Director. The Site Reliability Engineering team drives reliability strategy, elevates engineering standards, and owns some of the most complex and consequential work on the platform. As a Staff Site Reliability Engineer at TransUnion, you will serve as a senior technical leader and force multiplier on the SRE team. Operating with full autonomy, you will drive reliability strategy, lead high-risk technical initiatives, and set the engineering standards that elevate the entire team. You'll bring deep expertise across GCP, Kubernetes, CI/CD pipelines, and monitoring platforms - contributing to strategic decisions on major platform components while fully participating in on-call rotation. Whether stepping in to lead the team, owning complex capacity and security work, or anchoring incident response with calm and maturity, your impact will be felt across the platform and the people around you. This is a hybrid position and involves regular performance of job responsibilities virtually as well as in-person at an assigned TU office location for a minimum of two days a week. Role Overview and Core Responsibilities Technical Leadership & Strategic Influence Recognized expert across multiple systems; actively contributes to architectural and strategic decisions around major platform components. Leads research, testing, implementation, and continuous improvement for new systems and tooling. Performs complex, high-impact work including capacity planning, load testing, and security improvements. Operational Excellence & On-Call Fully participates in the team's on-call rotation; models calm, effective, and blameless incident response. Serves as a significant technical contributor during major incidents and problem resolution. Plans and leads high-risk maintenance events with minimal to no customer impact. Standards & Team Elevation Elevates team standards through new tooling, processes, procedures, and effective communication. Capable of stepping in to lead and represent the team - a trusted resource during transitions or coverage gaps. Sets new professional benchmarks in technical quality, engineering culture, and cross-functional collaboration. Required Knowledge and Experiences 5+ years of experience in Cloud Architecture, Site Reliability Engineering, Platform Engineering, or related fields - with a proven track record of designing and delivering at enterprise scale. Deep, hands-on expertise with Google Cloud Platform (GCP) and Kubernetes (K8s) - running high-volume, high-availability workloads with 99.999% reliability targets. Mastery of CI/CD pipeline architecture - designing end-to-end delivery systems that are fast, safe, and built for scale. Expert-level command of monitoring, observability, and alerting platforms (e.g., Datadog, Prometheus, Grafana, PagerDuty) - you define what good looks like. Deep Linux expertise - from kernel internals and system performance tuning to hardening and troubleshooting at the OS level in production environments. Strong command of database architecture - including relational (PostgreSQL, MySQL, Cloud SQL) and NoSQL (Bigtable, Firestore, Redis) systems, with experience designing for high availability, replication, failover, and performance at scale. Advanced networking knowledge - including VPCs, subnets, DNS, load balancing, firewall rules, VPNs, private service connect, and hybrid connectivity patterns across cloud and on-prem environments. Proven expertise in Infrastructure-as-Code (IaC) - designing and enforcing scalable, reusable frameworks using Terraform, Pulumi, or equivalent tools. Strong proficiency in scripting and automation (e.g., Python, Bash, Go) - building the tools and workflows that eliminate toil and accelerate delivery. Hands-on experience designing and integrating AI/ML-powered solutions into cloud-native platforms - including familiarity with LLM orchestration, vector databases, model serving infrastructure, and AI observability - with the ability to evaluate emerging tools and translate them into reliable, production-grade capabilities. Benefits that support every part of your life: At TransUnion, we design benefits to help you feel well, do well, and plan well-from day one. For Your Health : Enjoy day-one eligibility for medical, dental, and vision coverage, plus supplemental plan options. Spousal, domestic partner, and other eligible dependent coverage is available on select plans. Choose tax‑advantaged HSA and FSA accounts to make everyday care more affordable. For Your Protection : We've got your back with company‑paid basic life and AD&D, optional voluntary life and AD&D for you and your family, and short‑ and long‑term disability. You can also opt into a legal plan, pet insurance, and travel accident coverage. For Your Family : From adoption assistance and fertility planning coverage to ca

Staff Site Reliability Engineer

Benefits

Additional Information

Your Match

Company Intel

What employees say

Interested in this role?