Skip to main content
Back to jobs

Staff Site Reliability Engineer I

External
Remote logoRemote · Europe
Full-timeOn-site1mo ago
AWSBashCapacity PlanningCI/CDComplianceDatadog
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Own the technical direction of Remote's SRE/Platform domain, its architecture, tooling, and long-term roadmap
  • Define and drive the reliability strategy across the platform: SLOs/SLIs, error budgets, observability, and incident management maturity
  • Lead complex, cross-team infrastructure initiatives from discovery through delivery, delegating effectively and keeping projects aligned with business goals
  • Identify and lead AI enablement initiatives across the engineering organisation, exploring where AI can reduce operational overhead, accelerate development workflows, improve incident response, and unlock new capabilities for engineering teams
  • Drive AI-powered automation for platform operations: intelligent alerting, automated incident triage, self-healing infrastructure, and AI-assisted runbooks, reducing toil and freeing engineers to focus on higher-leverage work
  • Contribute to capacity planning and cost-efficiency of Remote's infrastructure
  • Mentor senior engineers, raising the technical bar through code reviews, design feedback, and hands-on guidance
  • Collaborate with the Security team on platform hardening, threat mitigation, and compliance
  • Be a steward of engineering quality across the SRE team, championing best practices, managing technical debt deliberately, and raising standards over time
  • Contribute to hiring, onboar

Requirements

  • Excellent communication and interpersonal skills
  • Holistic debugging skills
  • Security knowledge and capabilities from a defensive and offensive standpoint

Benefits

Vision insuranceRemote work options

Additional Information

About Remote Remote is solving modern organizations' biggest challenge - navigating global employment compliantly with ease. We make it possible for businesses of all sizes to recruit, pay, and manage international teams. With our core values at heart and future focused work culture, our team works tirelessly on ambitious problems, asynchronously, around the world. You can find Remoters working from 6 different continents (Antarctica left to go!) and all of our positions are fully remote. With Innovation as one of the core values, we have built Automation and AI capabilities into the requirements for every role. We encourage every member of the Remote team to bring their talents, experiences and culture to the table to help us build the best-in-class HR platform. If you are energetic, curious, motivated and ambitious, be part of our world. Apply now and define the future of work! What this job can offer you As a Staff SRE at Remote, you will own the technical direction of our SRE platform, shaping its architecture, reliability strategy, and long-term evolution. This is a leadership role as much as a technical one: you'll drive platform-wide initiatives, set the reliability bar for engineering teams across the organisation, and be a force multiplier for the engineers around you. A key part of this role is identifying and leading opportunities to leverage AI: from reducing operational toil to enabling engineering teams to build, ship, and operate software more effectively. You'll work with a high degree of autonomy, translating technical risks into business impact and aligning with Engineering Managers, Team Leads, and Product teams to ensure reliability and engineering efficiency are built into everything we do. What you bring Technical 8+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering Deep expertise in Kubernetes: operating, designing, and scaling production clusters Proven experience designing and managing cloud infrastructure on AWS (or other cloud providers) at scale Strong infrastructure-as-code practice with Terraform Experience defining and operating reliability frameworks: SLOs, SLIs, error budgets, alerting strategies Solid observability background: Datadog, Grafana/Prometheus, or similar Proficiency with CI/CD platforms (GitLab CI, GitHub Actions, or similar) and deployment automation Comfortable with Bash and scripting for automation; broader programming skills are a plus Experience with container tooling (Docker) and the broader ecosystem around it Curiosity and practical experience applying AI tools to infrastructure, operations, or developer tooling: whether through AI-assisted automation, LLM-powered workflows, or intelligent observability Leadership & behavioural Proven track record of driving platform-wide technical initiatives and influencing engineering direction without formal authority Strong communicator: able to tailor messaging to technical and non-technical audiences, write clearly, and align stakeholders across teams Self-directed: able to identify what needs attention, define the path forward, and execute with minimal supervision Experience mentoring senior engineers and creating space for others to lead and grow Comfortable navigating ambiguity, translating vague requirements into concrete solutions Approaches technical problems with a business lens, understands the cost and value of engineering decisions


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Remote? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect
Staff Site Reliability Engineer I at Remote