Skip to main content
Back to jobs

Site Reliability Engineer

External
flutterbe logoFlutterbe · Sofia, Bulgaria
Full-timeHybrid1w ago
AWSAzureCapacity PlanningChaos EngineeringComplianceFlutter
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • System Reliability & Performance
  • Maintain 99.9%+ uptime for the Observability platform that monitors and provides insights for systems serving millions of concurrent users
  • Design and support complete monitoring, alerting, and observability systems. Take responsibility for the tooling infrastructure that connects with various cloud services and platforms such as Grafana, Splunk, and CloudWatch.
  • Conduct capacity planning and performance optimization to ensure systems can handle peak loads during major sporting events
  • Establish and uphold Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for all essential services with assistance from Service Management
  • Incident Management & Response
  • Collaborate with Service Management to foster continuous improvement via blameless post-mortems. Detect repeated failure trends across the platform. Work with product teams on resilience upgrades that improve system dependability.
  • Work together with Service Management on post-incident reviews, offering technical insights and assisting in the adoption of preventative measures to minimize repeat incidents
  • Support the development and upkeep of detailed runbooks and incident response methods alongside Service Management teams
  • Observability & Monitoring Excellence
  • Deploy and maintain comprehensive monitoring dashboards and visualization tools for real-time system visibility across all Flutter platforms
  • Create custom dashboards and visual analytics for business metrics, technical indicators, and operational insights tailored to different partner needs
  • Configure and optimize data ingestion from diverse sources including time-series databases, log aggregation systems, cloud monitoring services, and custom APIs
  • Implement and refine alerting rules and notification workflows that reduce alert fatigue while ensuring critical issues are promptly brought up
  • Develop and sustain APM capabilities, incorporating instrumentation and telemetry collection into the current observability ecosystem
  • Work together with development teams to define, implement, and instrument custom business and technical metrics that provide actionable insights
  • Testing & Chaos Engineering:
  • Own and maintain the chaos testing framework and tools. Define standard failure s

Additional Information

Site Reliability Engineer Job title: Site Reliability Engineer Location : Sofia, Bulgaria Overview of the role: Flutter Technology is looking for a Site Reliability Engineer to guarantee the stability, uptime, and efficiency of our essential gaming and betting platforms throughout our worldwide operations. This position blends engineering skills with operational proficiency to sustain continuous service availability for millions of users globally via on-call support. As a member of Flutter Functions, you will work closely with development groups, infrastructure experts, and business partners. Together, you will maintain high-performance, scalable systems supporting our iGaming & Sports platforms in several markets. You will be the expert responsible for building and managing enterprise-level observability, disaster recovery, and business continuity features across our AWS Cloud environment. The ideal candidate combines strong understanding of SRE protocols with public cloud experience (AWS, Azure, GCP). This ensures our systems maintain high availability, recover rapidly from incidents, and offer comprehensive observability on our platform. You will be responsible for making sure our systems are resilient, recoverable, and subjected to regular fire drills and extensive testing. You will interact with senior participants during customer concern escalations and post-incident discussions, needing excellent communication abilities to convey technical challenges and operational updates. You should feel confident collaborating with cross-departmental teams in changing environments, working alongside development staff, infrastructure professionals, and business interested parties across various functions and brands. This role is critical to maintaining Flutter Entertainment's operational excellence. It also ensures we meet strict regulatory compliance in the highly regulated gaming industry. The role requires passion for system reliability and a proactive approach to spotting and fixing potential issues before they affect customers. It also involves implementing solutions that support Flutter's multi-regional, multi-market infrastructure. This role follows a hybrid approach to working, allowing you to combine working from home with working in our modern offices. These discussions are between you and your manager to find the best pattern for you both, while recognising that quality time together is essential for keeping us mission-aligned . Our teams work from a lively location nestled within this historic city. Enjoy the best of both worlds with winter and summer offices, tantalizing free snacks, and a gaming paradise for endless entertainment.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at flutterbe? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect