Staff Site Reliability & DevOps Engineer - Observability
One-Click ApplyYour profile and resume will be shared with the employer.
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Design, build, and operate observability platforms based on Grafana and Prometheus
- Define and maintain metrics standards, dashboards, alerts, and SLOs
- Improve signal quality: reduce alert noise, tune thresholds, and improve runbooks
- Support incident response by providing actionable telemetry and post-incident analysis
- Integrate metrics, logs, and traces across distributed systems
- Work with engineering teams to instrument services correctly
- Automate observability configuration using infrastructure as code
- Contribute to reliability improvements through capacity planning and performance analysis
- Required skills and experience
- Strong experience with Prometheus (scraping, federation, recording rules, alerting)
- Strong experience with Grafana (dashboards, alerting, templating, RBAC)
- Solid Linux and networking fundamentals
- Experience running observability stacks in Kubernetes environments
- Infrastructure as code experience (Terraform preferred)
- Familiarity with incident management and on-call practices
- Ability to debug production systems using metrics and logs
Requirements
- Experience with logs and traces (e.g. Loki, Tempo, OpenTelemetry)
- Experience operating large-scale or multi-cluster Kubernetes platforms
- Experience with cloud platforms (GCP, AWS, OCI)
- Exposure to SRE concepts such as error budgets and SLO-driven prioritisation
- What success looks like
- Engineers trust dashboards and alerts to reflect system health
- Incidents are detected earlier and diagnosed faster
- Alert fatigue is reduced and on-call quality improves
- Observability is treated as a first-class platform capabilit
- Please review our Global Candidate Data Privacy Statement to learn about Cision's commitment to protecting personal data collected during the hiring process.
Benefits
Additional Information
At Cision, we believe in empowering every individual to make an impact. Here, your voice is heard, your ideas are valued, and your unique perspective fuels our collective success. As part of our global team, you'll thrive in an environment that champions curiosity, collaboration, and innovation, all while making meaningful contributions to the brands we accelerate. Join us in shaping the future of communication and building authentic connections that matter. Whether you're solving complex problems or driving bold innovations, your growth is our success, and together, we'll create the conversations of tomorrow. Empower your impact at Cision. Be seen, be understood, be you. This role focuses on designing, operating, and evolving observability platforms with a strong emphasis on metrics, logging, and alerting. The primary tooling is Grafana and Prometheus, with responsibility for ensuring production systems are observable, reliable, and operable at scale. The role works closely with platform, infrastructure, and application teams.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at brandwatch? Share your experience
Interested in this role?
One tap and your profile goes straight to the employer.
Your profile and resume will be shared with the employer.