Skip to main content
Back to jobs

Director, Infrastructure & SRE

External
tailorcare2023 logoTailorcare2023 · Montreal, Canada
Full-timeOn-site2w ago
API GatewayAWSCI/CDComplianceDatadogDNS
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

The Director of Infrastructure & SRE owns the function end-to-end: reliability, security, scalability, and operational governance of TailorCare's infrastructure, plus the team that delivers it. You will be a peer to the Director of Software Engineering, Director of Data Engineering, and Director of Data Science, own the Infrastructure & SRE scorecard in front of the executive team, and lead vendor escalations with Salesforce, AWS, and Cresta, among others, at the Director level. This is a player-coach role. In year one you will spend roughly 60% of your time hands-on (writing Terraform, leading incidents, doing architecture work) and 40% building the team and the practice. As the team scales, that ratio shifts toward leadership, but you will never stop being technical. This is not a slideware role. We are not hiring a manager who reviews architecture diagrams from a distance. We are hiring an operator who codes, runs incidents, owns the platform, and ships Primary Responsibilities Infrastructure as Code Converge all AWS resources to Terraform; eliminate manual provisioning Establish reproducible environments (dev, staging, production) with proper isolation and parity Standardize CI/CD pipelines across all engineering teams Site Reliability Define and operate SLOs, SLIs, and error budgets for all production systems (web/mobile applications, Salesforce, data processing, telephony stack) Build observability (metrics, logs, traces, alerting) across AWS, Salesforce, telephony/omni-channel, and Cresta integrations Stand up the infrastructure on-call rotation, incident management, and post-incident review discipline, including RCAs Own uptime, MTTR, and incident-volume trends as published metrics Disaster Recovery & Business Continuity Design and implement a tested DR strategy with documented RPO/RTO commitments Validate recovery procedures on a recurring cadence Align DR posture with HITRUST and HIPAA expectations Integration Reliability Stabilize Salesforce, telephony/omni-channel, and Cresta integrations; close persistent gaps in skills-based routing, warm transfers, and telephony data parity Partner with Data Engineering on the reliability of data ingest paths (Fivetran, SFTP, S3) and Salesforce bulk API flows. Security & Compliance Engineering Translate Security & Compliance policy into enforced infrastructure controls: IAM, encryption (at rest and in transit), network segmentation, secrets management, audit logging Partner with Security & Compliance on HITRUST evidence, audit readiness, and remediation Own vulnerability management across cloud and application layers Email & Domain Infrastructure Fix DNS, SPF, DKIM, DMARC, and IP reputation to resolve spam-folder deliverability impacting patient and operational communications Own all TailorCare domain and email infrastructure Developer Experience Build and maintain test, staging, and ephemeral environments engineers actually use Reduce cycle time and remove infrastructure friction from the SDLC Establish self-service tooling so engineers ship without filing tickets Team & Function Leadership Hire, level, develop, and retain the Infrastructure & SRE team Own the function's MBR contribution: scorecard, risks, decisions needed Partner with Engineering, Data, Product, and Security & Compliance leadership as a peer Other duties as assigned

Requirements

  • 10+ years in Infrastructure Engineering, SRE, or DevOps, with 3+ years in a senior IC or tech lead role and 2+ years directly managing engineers
  • Recent hands-on technical work (within the last 12 to 18 months) in Terraform, AWS, and production incident response
  • Track record of hiring, leveling, and developing infrastructure or SRE engineers
  • Deep AWS expertise (VPC, IAM, ECS/EKS, Lambda, RDS, DynamoDB, S3, API Gateway, WAF, Connect)
  • Production Terraform experience at scale (modules, state management, multi-environment)
  • Hands-on with observability stacks (CloudWatch, Datadog, Grafana, or equivalents)
  • Demonstrated experience standing up SRE practices: SLOs, on-call, incident management, blameless postmortems
  • Experience operating in a HIPAA or comparably regulated environment (PCI, SOC 2 Type II, HITRUST, FedRAMP)
  • CI/CD pipeline design (GitHub Actions, GitLab CI, or equivalent)
  • Ability and willingness to travel up to 10% as needed for onsite meetings, team collaboration, and company events.
  • Salesforce platform integration and operational experience
  • Amazon Connect or comparable contact center telephony platforms
  • Data platforms (Databricks, Snowflake, Fivetran)
  • HITRUST certification participation (e1 or r2)
  • AI/LLM-assisted operations tooling
  • Experience scaling an infrastructure function in a healthcare or other regulated growth-stage company
  • You own outcomes. When something breaks, you fix it and improve the system so it does not happen again.
  • You write code and ship infrastructure. You lead by doing, not by dele

Benefits

Health insuranceVision insurance

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at tailorcare2023? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect