Skip to main content
Back to jobs

Site Reliability Engineer - Platform Infrastructure Engineering

External
idme logoIdme · Mountain View, CA
Full-timeOn-site1mo ago
AWSAzureCI/CDComplianceDockerDocumentation
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Build and maintain automated reliability tooling , infrastructure as code, and observability systems that enhance uptime and service performance.
  • Develop monitoring, logging, and alerting frameworks (e.g., Prometheus, Grafana, OpenTelemetry) to detect and remediate issues proactively.
  • Implement automated architectural reviews and reliability guardrails for agent-developed applications to ensure machine-generated code meets long-term maintainability and performance standards.
  • Partner with engineering teams to design and implement scalable, fault-tolerant systems that meet defined SLIs and SLOs.
  • Automate repetitive operational tasks and develop self-healing and auto-remediation mechanisms to minimize human intervention.
  • Participate in on-call rotations and lead incident response efforts, performing post-incident reviews and driving systemic improvements.
  • Improve the deployment and release process using CI/CD pipelines and progressive delivery techniques to ensure stability and safety.
  • Champion observability, reliability, and operational readiness reviews as part of the development process.
  • Collaborate with Security and Compliance teams to ensure production systems meet FedRAMP, NIST, and internal policy requirements .
  • Contribute to documentation, runbooks, and internal tooling to enhance knowledge sharing and operational maturity across teams.

Requirements

  • Bachelor's degree in Computer Science, Software Engineering, or a related technical field.
  • 3-5 years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
  • 2+ years of hands-on experience managing and scaling services in cloud environments such as AWS, GCP, or Azure.
  • 1+ years proficiency in at least one modern programming language (e.g., Java, Go, Python, Ruby, JavaScript).
  • Strong understanding of containerization and orchestration technologies (Docker, Kubernetes).
  • Experience implementing and maintaining CI/CD pipelines and automation frameworks.
  • Working knowledge of observability systems -metrics, tracing, logging

Benefits

Health insurance

Additional Information

Company Overview ID.me is the next-generation digital identity wallet that simplifies how individuals securely prove their identity online. Consumers can verify their identity with ID.me once and seamlessly login across websites without having to create a new login and verify their identity again. Over 152 million users experience streamlined login and identity verification with ID.me at 20 federal agencies, 45 state government agencies, and 70+ healthcare organizations. More than 600+ consumer brands use ID.me to verify communities and user segments to honor service and build more authentic relationships. ID.me's technology meets the federal standards for consumer authentication set by the Commerce Department and is approved as a NIST 800-63-3 IAL2 / AAL2 credential service provider by the Kantara Initiative. ID.me is committed to "No Identity Left Behind" to enable all people to have a secure digital identity. To learn more, visit https://network.id.me/ . Company Overview ID.me is the next-generation digital identity wallet that simplifies how individuals securely prove their identity online. Consumers can verify their identity with ID.me once and seamlessly log in across websites without needing to create a new login and re-verify. Over 140 million users experience streamlined login and identity verification with ID.me at 20 federal agencies, 44 state government agencies, and 66 healthcare organizations. More than 600 consumer brands use ID.me to verify communities and user segments to honor service and build more authentic relationships. ID.me's technology meets the federal standards for consumer authentication set by the Commerce Department and is approved as a NIST 800-63-3 IAL2 / AAL2 credential service provider by the Kantara Initiative. ID.me is committed to "No Identity Left Behind" to enable all people to have a secure digital identity. To learn more, visit https://network.id.me . Role Overview We are seeking a Site Reliability Engineer to join our Core Platform Engineering organization. The SRE team builds the automation, observability, and operational foundations that ensure ID.me's services are reliable, scalable, and secure. As an SRE, you will play a pivotal role in building the platform and governance processes required to safely scale, deploy, and operate a high volume of machine-generated applications and features. You will design and implement the automated guardrails that maintain our high standards for resilience and security in an AI-accelerated development environment. You'll focus on infrastructure automation, observability, performance optimization, and incident response, partnering closely with Software Engineering teams to foster a culture of reliability and operational excellence. This role is based out of our Mountain View, CA or McLean, VA offices and requires full-time in-office attendance, 5 days per week .


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at idme? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect
Site Reliability Engineer - Platform Infrastructure Engineering at Idme