SRE specialist
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
We are seeking a hands-on Site Reliability Engineer within the Intelligent Operations Department's SRE & Resiliency team. This role operates across Azure, AWS, GCP, and on‑prem environments, embedded in the broader enterprise resiliency and production reliability strategy. The SRE will function as part of a special investigations unit that empowers and enables Applicative Support, Infrastructure Support, and the Incident Management team-coaching, guiding, and leading investigations into active incidents and proactive reliability improvements. Core responsibilities include deep investigations, advanced observability (OpenTelemetry, Dynatrace, Elastic), auto-healing tooling, SLI/SLO stewardship, and business-aligned reliability reporting. What you'll do here: Incidents & Investigations Lead high‑severity investigations and RCA with App/Infra/Incident teams. Proactively find systemic risks and resilience gaps; drive durable fixes. Run blameless post‑mortems and coach teams. Observability (OTel, Dynatrace, Elastic) Implement end‑to‑end traces/metrics/logs with consistent semantics. Build insights and anomaly detection; create topology‑aware health models. Integrate synthetics, contract tests, and distributed tracing. Auto‑Healing & Reliability Tooling Build policy‑driven remediation (circuit breakers, throttling, retries). Enable progressive delivery (blue/green, canary) with safe rollbacks. Provide resilience tooling: validation, safeguards, chaos, DR, runbooks. SLI/SLOs & Reporting Define user‑centric SLIs/SLOs; enforce error budget policies. Publish reliability reports and scorecards; drive continuous improvement. Coaching & Leadership Upskill support/incident teams; standardize playbooks and training. Promote automation‑first, data‑driven, resilience culture. Cloud & Platform Reliability Operate across Azure/AWS/GCP/on‑prem; GLB, DNS, TLS, CDN, failover. Improve K8s/mesh (AKS/EKS/GKE, Istio/Linkerd) and data/streaming resilience. AI for Reliability Use AI for causal detection/anomalies to cut MTTR. Develop reliability copilots; monitor AI systems for reliability and cost. What you bring to the table: 8+ years of experience in SRE/Platform/Infrastructure/Software Engineering operating large-scale production systems across multi-cloud and on‑prem. Strong proficiency in: Observability: OpenTelemetry instrumentation and standards; Dynatrace (Davis AI, SmartScape, service-level analysis, baselining); Elastic/ELK (Beats/Agent, ingest pipelines, ILM, Kibana). Reliability engineering: SLIs/SLOs/SLAs, error budgets, alert strategy, capacity modeling, graceful degradation, circuit breaking, retries/backoff. CI/CD and deployment patterns: blue/green, canary, progressive delivery, automated rollback, pipeline safeguards. Kubernetes and service meshes; platform-level resilience and operability. Data and event systems: replication, snapshots/PITR, CDC, streaming (Kafka, RabbitMQ, Pub/Sub) with DLQs/reprocessing; dependency risk management. Networking and traffic: DNS
Benefits
Additional Information
Our employees are at the heart of everything we do. Together, we help people, businesses, and society prosper in good times and be resilient in bad times. Our employee promise represents Intact's commitment to you in exchange for living our Values, striving to do your best work, being open to change and investing in your career. In return, we promise to provide support, opportunities and performance-led financial rewards at a workplace where you can shape the future, win as a team and grow with us. Pay at Intact is about much more than just salary. Flexible work arrangements and a hybrid work model Possibility to purchase up to 5 extra days off per year Multiple benefits offered to support physical and mental wellbeing, including telemedicine, Wellness account and much more Share plan & other savings: up to 12% of salary or even more (ask how you could earn guaranteed income for life) Salary range (but not limited to): 109,900 - 134,300 Annual bonus target, based on the base salary, with a potential payout of up to double the target (subject to personal and company performance): 15% As part of our commitment to Win As A Team , we share our success with employees through our annual bonus plan and Employee Share Purchase Plan (ESPP) - with Intact matching 50% of your net shares. Our pension offerings provide flexibility and long-term security for our employees beyond their careers. We are one of the few companies offering the opportunity to receive guaranteed income for life via our defined benefit pension plan. Salary for the candidate will be determined taking into consideration a number of factors including: experience, skills, qualifications, anticipated contribution to role, internal equity, etc. The salary range presented above is based on a 35-hour workweek and would represent a majority of different candidate profiles. However, we encourage candidates who may fall outside of this range to apply as well.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at intactfc? Share your experience