Skip to main content
Back to jobs

Staff Site Reliability & DevOps Engineer - Observability

One-Click Apply
brandwatch logoBrandwatch · Remote
Full-timeRemote1mo ago30+ days old, may be filled
AWSCapacity PlanningGCPGrafanaIncident ResponseKubernetes
Cover LetterConnect

Your profile and resume will be shared with the employer.

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Design, build, and operate observability platforms based on Grafana and Prometheus
  • Define and maintain metrics standards, dashboards, alerts, and SLOs
  • Improve signal quality: reduce alert noise, tune thresholds, and improve runbooks
  • Support incident response by providing actionable telemetry and post-incident analysis
  • Integrate metrics, logs, and traces across distributed systems
  • Work with engineering teams to instrument services correctly
  • Automate observability configuration using infrastructure as code
  • Contribute to reliability improvements through capacity planning and performance analysis
  • Required skills and experience
  • Strong experience with Prometheus (scraping, federation, recording rules, alerting)
  • Strong experience with Grafana (dashboards, alerting, templating, RBAC)
  • Solid Linux and networking fundamentals
  • Experience running observability stacks in Kubernetes environments
  • Infrastructure as code experience (Terraform preferred)
  • Familiarity with incident management and on-call practices
  • Ability to debug production systems using metrics and logs

Requirements

  • Experience with logs and traces (e.g. Loki, Tempo, OpenTelemetry)
  • Experience operating large-scale or multi-cluster Kubernetes platforms
  • Experience with cloud platforms (GCP, AWS, OCI)
  • Exposure to SRE concepts such as error budgets and SLO-driven prioritisation
  • What success looks like
  • Engineers trust dashboards and alerts to reflect system health
  • Incidents are detected earlier and diagnosed faster
  • Alert fatigue is reduced and on-call quality improves
  • Observability is treated as a first-class platform capabilit
  • Please review our Global Candidate Data Privacy Statement to learn about Cision's commitment to protecting personal data collected during the hiring process.

Benefits

Health insuranceEquity / stock options

Additional Information

At Cision, we believe in empowering every individual to make an impact. Here, your voice is heard, your ideas are valued, and your unique perspective fuels our collective success. As part of our global team, you'll thrive in an environment that champions curiosity, collaboration, and innovation, all while making meaningful contributions to the brands we accelerate. Join us in shaping the future of communication and building authentic connections that matter. Whether you're solving complex problems or driving bold innovations, your growth is our success, and together, we'll create the conversations of tomorrow. Empower your impact at Cision. Be seen, be understood, be you. This role focuses on designing, operating, and evolving observability platforms with a strong emphasis on metrics, logging, and alerting. The primary tooling is Grafana and Prometheus, with responsibility for ensuring production systems are observable, reliable, and operable at scale. The role works closely with platform, infrastructure, and application teams.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at brandwatch? Share your experience

Interested in this role?

One tap and your profile goes straight to the employer.

Cover LetterConnect

Your profile and resume will be shared with the employer.