Staff Site Reliability & DevOps Engineer - Observability

One-Click Apply

Brandwatch · Remote

Full-timeRemote1mo ago30+ days old, may be filled

AWSCapacity PlanningGCPGrafanaIncident ResponseKubernetes

Cover Letter Connect

Your profile and resume will be shared with the employer.

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Design, build, and operate observability platforms based on Grafana and Prometheus
Define and maintain metrics standards, dashboards, alerts, and SLOs
Improve signal quality: reduce alert noise, tune thresholds, and improve runbooks
Support incident response by providing actionable telemetry and post-incident analysis
Integrate metrics, logs, and traces across distributed systems
Work with engineering teams to instrument services correctly
Automate observability configuration using infrastructure as code
Contribute to reliability improvements through capacity planning and performance analysis
Required skills and experience
Strong experience with Prometheus (scraping, federation, recording rules, alerting)
Strong experience with Grafana (dashboards, alerting, templating, RBAC)
Solid Linux and networking fundamentals
Experience running observability stacks in Kubernetes environments
Infrastructure as code experience (Terraform preferred)
Familiarity with incident management and on-call practices
Ability to debug production systems using metrics and logs

Requirements

Experience with logs and traces (e.g. Loki, Tempo, OpenTelemetry)
Experience operating large-scale or multi-cluster Kubernetes platforms
Experience with cloud platforms (GCP, AWS, OCI)
Exposure to SRE concepts such as error budgets and SLO-driven prioritisation
What success looks like
Engineers trust dashboards and alerts to reflect system health
Incidents are detected earlier and diagnosed faster
Alert fatigue is reduced and on-call quality improves
Observability is treated as a first-class platform capabilit
Please review our Global Candidate Data Privacy Statement to learn about Cision's commitment to protecting personal data collected during the hiring process.

Benefits

Health insuranceEquity / stock options

Additional Information

At Cision, we believe in empowering every individual to make an impact. Here, your voice is heard, your ideas are valued, and your unique perspective fuels our collective success. As part of our global team, you'll thrive in an environment that champions curiosity, collaboration, and innovation, all while making meaningful contributions to the brands we accelerate. Join us in shaping the future of communication and building authentic connections that matter. Whether you're solving complex problems or driving bold innovations, your growth is our success, and together, we'll create the conversations of tomorrow. Empower your impact at Cision. Be seen, be understood, be you. This role focuses on designing, operating, and evolving observability platforms with a strong emphasis on metrics, logging, and alerting. The primary tooling is Grafana and Prometheus, with responsibility for ensuring production systems are observable, reliable, and operable at scale. The role works closely with platform, infrastructure, and application teams.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at brandwatch? Share your experience

Interested in this role?

One tap and your profile goes straight to the employer.

Cover Letter Connect

Your profile and resume will be shared with the employer.