Principal Site Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
DigiCert is a global leader in intelligent trust. We protect the digital world by ensuring the security, privacy, and authenticity of every interaction. Our AI-powered DigiCert ONE platform unifies PKI, DNS, and certificate lifecycle management, to secure infrastructure, software, devices, messages, AI content and agents. Learn why more than 100,000 organizations, including 90% of the Fortune 500, choose DigiCert to stop today's threats and prepare for a quantum-safe future at www.digicert.com Job summary The Platform Ops team within CloudOps is responsible for the reliability, scalability, and modernization of DigiCert's cloud infrastructure. As a Principle SRE, you will own the intersection of software engineering and operations-driving automation-first practices, reducing toil, and accelerating our cloud transformation across AWS, Azure, and GCP environments. You will be a technical force multiplier: raising reliability standards across the organization, defining SLOs that matter, and building the internal platforms and tooling that enable product teams to ship with confidence.
Responsibilities
- Reliability Engineering
- Define, implement, and own SLIs, SLOs, and error budgets for critical platform services
- Lead blameless post-mortems and drive systemic reliability improvements across the platform
- Design and implement observability pipelines (metrics, logs, traces) using tools such as Splunk, Prometheus, Grafana, or OpenTelemetry
- Participate in on-call rotation and serve as an incident commander for P0/P1 events
- Cloud Modernization
- Architect and execute migration strategies from legacy infrastructure to cloud-native patterns (containers, serverless, managed services)
- Champion adoption of Kubernetes, service mesh, and managed cloud services (EKS, GKE, AKS)
- Evaluate and introduce emerging cloud technologies that improve availability, cost efficiency, and developer experience
- Partner with architecture and security teams to embed reliability and compliance into platform design
- Automation & Platform Development
- Build and maintain infrastructure-as-code using Terraform across multi-cloud environments
- Develop internal tooling, self-service platforms, and golden-path templates that reduce operational burden for development teams
- Automate operational workflows including provisioning, scaling, patching, and secret rotation
- Contribute to and maintain CI/CD pipelines (GitHub Actions) to enable safe, frequent deployments
- Engineering Leadership
- Mentor mid-level engineers on SRE principles, distributed systems, and infrastructure best practices
- Collaborate cross-functionally with product, security, and compliance teams to deliver on platform roadmap commitments
- Document architectural decisions, runbooks, and platform standards; raise the engineering bar through code and design reviews
- What you will have
- 5+ years of experience in SRE, platform engineering, or infrastructure engineering roles
- Deep proficiency in at least one major cloud provider (AWS, GCP, or Azure) with working knowledge of multi-cloud environments
- Strong software engineering skills in Python, Go, or Bash; comfortable writing production-grade automation and tooling
- Hands-on Kubernetes experience: cluster operations, workload management, networking (CNI/service mesh), and security (RBAC, pod security)
- Infrastructure-as-code expertise with Terraform or equivalent; experience with GitOps workflows
- Proven experience designing and operating observability systems and responding to production incidents at scale
- Strong understanding of networking fundamentals: DNS, TLS/PKI, load balancing, and zero-trust networking concepts
Requirements
- Experience in PKI, certificate lifecycle management, or security-adjacent infrastructure
- Familiarity with compliance frameworks such as SOC 2, FedRAMP, or ISO 27001 in cloud environments
- Prior experience driving cloud migration or modernization programs at scale
- Contributions to open-source infrastructure or platform projects
- AWS/GCP/Azure professional-level certifications (e.g., AWS Solutions Architect Professional, CKA/CKS)
- What success looks like
- Working at DigiCert CloudOps
- Greenfield modernization: we are actively migrating workloads and building new platform capabilities-you'll shape the architecture, not just maintain it
- Engineering-first culture with a strong bias toward automation, GitOps, and platform thinking
- Cross-functional visibility: PlatformOps partners directly with product, security, and compliance-your work has organization-wide impact
- Competitive compensation, equ
Benefits
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at digicert? Share your experience