Site Reliability Engineer

External

Recorded Future · Gothenburg, Sweden

Full-timeOn-site3w ago

ApacheAWSCI/CDComplianceDocumentationElasticsearch

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Ensure the performance, capacity, scalability, reliability, resiliency, security, compliance, support, cost efficiency, SLA, SLOs, RPOs and RTOs for the platform, either directly or in collaboration with other teams.
Make systemic improvements both proactively and for recurring issues.
Perform comprehensive Root Cause Analysis for outages.
Design, implement, and maintain scalable and reliable infrastructure on AWS.
Develop and manage observability solutions using tools such as Grafana, ELK (Elasticsearch, Logstash, Kibana), and Prometheus to monitor system health and performance.
Automate infrastructure provisioning and configuration using Terraform and Chef.
Participate in a 24/7 on-call rotation to respond to and resolve production incidents.
Collaborate with engineering teams to ensure applications are designed for high availability and resilience.
Proactively identify and address performance bottlenecks and potential issues.
Drive continuous improvement through automation, process optimization, and post-incident reviews.

Requirements

3+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role.
Extensive hands-on experience with Amazon Web Services (AWS), including a deep understanding of networking concepts within AWS.
Expert-level troubleshooting and diagnostic skills
Proven track record of reducing system downtime
Ability to grasp complex architectures.
Advanced Linux skills (engineering fundamentals, networking, storage, operating systems)
Exposure managing and optimizing observability suites (e.g., Grafana, ELK Stack).
Strong proficiency in Terraform and Chef.
A strong preference for automating tasks and implementing solutions via Infrastructure as Code rather than manual changes.
Skilled in creating clear, concise incident reports and technical documentation
Ability to stay calm under pressure during an outage.
Fantastic collaboration skills.
Spectacular collaborator and communicator.
A team player but self motivated.
Knowledge and experience with Kubernetes.
Familiarity with message brokers such as RabbitMQ and Apache Kafka.
Experience with NoSQL databases, particularly MongoDB and Elasticsearch.
Familiarity with OpenTelemetry
Experience with large distributed systems and microservices architecture
Experience with CI/CD pipelines.
#LI-Hybrid
Why should you join Recorded Future?
Want more info?
Blog & Podcast : Learn everything you want to know (and maybe some things you'd rather not know) about the world of cyber threat intelligence
Linkedin , Instagram & Twitter : What's happening at Recorded Future
The Record : The Record is a cybersecurity news publication that explores the untold stories in this rapidly changing field
Timeline : History of Recorded Future
Recognition : Check out our awards and announcements
We are committed to maintaining an environment that attracts and retains talent from a diverse range of experiences, backgrounds and lifestyles. By ensuring all feel included and respected for being unique and bringing their whole selves to work, Recorded Future is made a better place every day.
If you need any accommodation or special assistance to navigate our website or to complete your application, please send an e-mail with your request to our recruiting team at careers@recordedfuture.com
Recorded Future will not discharge, discipline or in any other manner discriminate against any employee or applicant for employment because such employee or applicant has inquired about, discussed, or di

Benefits

Health insuranceVision insurance

Additional Information

With 1,000+ intelligence professionals serving over 1,900 clients worldwide, Recorded Future is the world's most advanced, and largest, intelligence company! Recorded Future is seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our growing team. In this role, you will be instrumental in ensuring the reliability, scalability, and performance of our critical systems. You will work closely with development teams to build and maintain robust infrastructure, implement automation, and foster a culture of operational excellence. This position requires a strong understanding of cloud environments, observability, and infrastructure as code principles.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Recorded Future? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect