Skip to main content
Back to jobs

Staff Site Reliability Engineer - Observability

External
Okta logoOkta · Bellevue, WA
$194K–$267K/yrFull-timeOn-site1mo ago
AWSDNSDocumentationGCPGrafanaIncident Response
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.
  • GCP Observabilty Engineering: Optimize the collection, processing, and storage of Observabilty data to ensure high reliability and low latency of our Splunk and Grafana services
  • Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and "observability-driven development."
  • Automation: Eliminate "toil" by automating the deployment and scaling of observability agents and collectors.
  • Required Skills & Experience (The Essentials)
  • Programming Proficiency: Strong coding skills in Python , Go for building internal tools and automating workflows.
  • Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/GKE).
  • Problem Solving: A data-driven approach to debugging complex, cross-service performance bottlenecks.
  • Bonus Skills (The "Nice-to-Haves")
  • Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Vector, or similar frameworks for instrumenting applications.
  • Grafana Loki: Experience in migrating Splunk to Grafana Loki
  • Other Cloud Platforms: Experience managing observability native tools within AWS.
  • Additional requirements:
  • #LI-MM
  • #LI-Hybrid
  • P24517_3387022
  • The annual base salary range for this position for candidates located in the San Francisco Bay area is between:
  • $194,000 - $267,000 USD
  • The Okta Experience
  • Supporting Your Well-Being
  • Driving Social Impact
  • Developing Talent and Fostering Connection + Community
  • We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.
  • If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation.
  • Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local

Benefits

Health insuranceDental insuranceVision insurance401(k)Paid time offFlexible scheduleEquity / stock optionsPerformance bonusParental leave

Additional Information

Secure Every Identity, from AI to Human Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence. This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk. We are seeking a highly technical Observability Site Reliability Engineer with a specialty in Google Cloud, to own and expand our Observability ecosystem into GCP. In this role, you will move beyond simple monitoring to delivering a world class, comprehensive, scalable Observability Platform that enables our SRE teams and business partners. You will treat infrastructure as code -utilizing Terraform and strong coding proficiency in Go, Python, or Ruby -to automate the deployment of agents and collectors across complex distributed systems.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Okta? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect