Skip to main content
Back to jobs

Senior Software Engineer - Observability & IRM

External
thetradedesk logoThetradedesk · Bellevue
Full-timeOn-site1mo ago
GrafanaIncident ResponseKubernetesLessObservabilityPrometheus
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

The Service Excellence (SE) team owns the tools and infrastructure that help engineers at The Trade Desk understand and operate production systems. The Incident Response Services (IRS) taskforce focuses on the on-call experience. The team is responsible for making incidents easier to detect, manage, and optimize using historical data points information.

Responsibilities

  • Incident management tooling
  • Build and maintain automation around the incident lifecycle: alerting, escalation, incident channels, retros, and SLA tracking
  • Help evaluate and migrate our logging stack
  • Participate in the re-evaluation of our logging vendor and collection architecture
  • Backstage/Service catalog - Extend our internal developer portal with K8s integrations, maturity models, and SLO adoption tooling
  • Alert quality tooling - Build the systems that give engineers better signal and less noise - smarter routing, better grouping, tighter feedback loops between alerts and the teams that own them

Requirements

  • Experience building and operating production infrastructure or internal developer tooling
  • Comfort working across the stack - this role touches distributed systems, Kubernetes, observability pipelines, and web-based tooling
  • Familiarity with observability concepts: logging, alerting, on-call workflows
  • Strong debugging instincts: You will be expected to be called on when things break
  • Clear communication: The team works closely with engineers across the company; you'll need to explain tradeoffs and advocate for solutions
  • Plus skills:
  • Experience with Grafana, Prometheus, or similar observability tools
  • Familiarity with Sumo Logic or other log management platforms
  • Prior work on developer portals or service catalog tooling (Backstage, OpsLevel, etc.)
  • Experience with Kubernetes at scale
  • A deep understanding of HunnyPt
  • #LI-AD1

Benefits

Health insuranceDental insuranceVision insurance401(k)Equity / stock options

Additional Information

The Trade Desk is a global technology company and the world's leading independent platform for digital advertising, with nearly 4,000 employees across more than 30 offices. Our technology helps advertisers reach the right audiences across the open internet - from streaming TV and podcasts to mobile apps, news, and more. Advertising powers the content people love. By making it more transparent, effective, and responsible, we help support trusted journalism, quality entertainment, and creators worldwide. The world's brands and agencies rely on us to reach their customers and grow their businesses responsibly. The scale of our platform brings unique technical challenges - from processing massive datasets in real time to building systems that operate reliably on a global scale. When you work here, your impact is worldwide. We welcome diverse perspectives, encourage curiosity, and build teams that learn from one another. If you're driven to solve meaningful challenges, we'd love to meet you.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at thetradedesk? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect