Principal Site Reliability Engineer

External

Ispottv · Bellevue, WA

Full-timeOn-site1mo ago

ApacheArgoCDAWSCI/CDHelmIncident Response

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

System Reliability and Operations (SRE Focus)
Platform Design and Management: Architect, build, and maintain scalable, highly available, and reliable cloud infrastructure in AWS leveraging modern container orchestration technologies.
Data Pipeline Reliability: Serve as the reliability and cost optimization expert for high-volume, data-intensive workloads. Focus on optimizing and ensuring the stability of distributed data processing engines, specifically Apache Spark and related ecosystems (e.g., EMR, Databricks, Glue).
Observability and Monitoring: Establish comprehensive observability practices by defining SLIs/SLOs, implementing advanced monitoring, alerting, and logging solutions to quickly identify and resolve system anomalies.
Automation: Drive automation across all operational aspects, including infrastructure provisioning (Terraform), scaling, deployment, and incident response, minimizing toil and manual effort.
Incident Management: Lead and participate in the incident response lifecycle, performing thorough post-mortems to derive actionable insights and implement preventative measures to improve system resilience.
Developer Experience and Productivity (DevEx Focus)
Platform Strategy: Design, implement, and champion self-service tools, internal developer portals, and services that empower engineering teams to manage their infrastructure and deployments independently and efficiently.
AI Developer Tools: Lead the standardization of AI developer assistants by architecting and maintaining global 'steering files' and context-configuration standards, ensuring AI-generated code aligns with our specific patterns, security protocols, and architectural guardrails.
KPIs: Develop, implement, and maintain a set of key performance indicators (KPIs) to measure and improve

Benefits

Vision insuranceEquity / stock options

Additional Information

Immigration / Work Authorization Notice: Applicants must be currently authorized to work in the United States. iSpot is not able to sponsor or take over sponsorship of an employment visa for this position at this time. iSpot competes for the best talent. Our compensation packages consist of salary and equity in one of Seattle's hottest start-ups, as well as other standard benefits. Most importantly, we provide a really interesting working experience, and the chance to contribute to the success of something great. What You'll Be Part Of: iSpot.tv is changing how brands, agencies, and networks measure and assess the impact of TV advertising. We deal with BIG data, operating mainly in AWS with multiple Kubernetes clusters and thousands of servers. We are looking for an experienced SRE leader with the skills and passion to make a significant impact on our ecosystem. You will have a wide array of projects to tackle, with ample opportunities for growth. You will be a key member of our SRE leadership team, focused on empowering developers to build, test, and deploy applications faster and more efficiently. You will both lead the team and remain hands-on in designing, building, and maintaining the tools, platforms, and processes that improve our engineering teams' productivity and streamline the software development lifecycle. Your work will directly impact developer happiness and the speed at which we can deliver innovative features to our customers.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at ispottv? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect