Skip to main content
Back to jobs

Senior Site Reliability Engineer - Ireland

External
Aristanetworks logoAristanetworks · Dublin, Ireland
Full-timeRemote1mo ago30+ days old, may be filled
AnsibleAWSAzureBashCI/CDDocker
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Design, build, and deploy production systems with a focus on scalability, reliability, observability, and performance, ensuring systems meet stringent security standards
  • Develop and maintain comprehensive automation solutions to eliminate toil and streamline operational efficiency across production environments
  • Proactively monitor production systems, establish intelligent alerting strategies, and implement automated incident response mechanisms to minimise downtime
  • Create and maintain detailed incident response runbooks; conduct thorough postmortem analyses following incidents to identify root causes and prevent recurrence
  • Collaborate with software engineering teams to identify and resolve infrastructural bottlenecks, designing innovative solutions that enhance product deployment workflows
  • Manage and optimise monitoring infrastructure using industry-standard tools, ensuring comprehensive visibility across all systems
  • Plan, communicate, and execute maintenance windows on production systems with minimal disruption to service availability
  • Triage platform and infrastructural issues with decisiveness and analytical rigour; engage with third-party vendors and support teams as required
  • Deploy new systems and updates in a staged, risk-managed manner, ensuring safe and incremental rollouts
  • Survey and adopt best practices in infrastructure and platform management to maintain secure, scalable, and fault-tolerant systems
  • Study the design and implementation details of open-source systems to enhance troubleshooting capabilities and accelerate issue resolution
  • Work transparently with stakeholders to communicate system status, planned maintenance, and infrastructure improvements
  • #LI-EO1
  • #automation #Ansible #Terraform #observability #Prometheus #Grafana #cloud platforms #AWS #GCP #Azure #container #orchestration #Kubernetes #Docker #CI/CD #Jenkins #GitLab
  • **Essential Requirements:**
  • Bachelor's degree in Computer Science, Engineering, or equivalent professional experience (5+ years in a related infrastructure or systems role)
  • Proficiency in one or more programming languages: Go, Python, or bash shell scripting , with the ability to implement medium-complexity automation workflows
  • Strong knowledge of Linux or UNIX from both administration and debugging perspectives
  • Hands-on experience operating software systems, infrastructure, and complex applications at scale in production environments
  • Demonstrated expertise in infrastructure-as-code principles and practices
  • Strong problem-solving and software troubleshooting skills with a methodical, analytical approach
  • Experience with server provisioning, particularly from storage and networking perspectives
  • Proven ability to work collaboratively within cross-functional teams and communicate technical concepts clearly
  • Experience with incident response, postmortem analysis, and continuous improvement methodologies
  • **Desirable Skills and Experience:**
  • Experience with container orchestration platforms, particularly Kubernetes
  • Hands-on experience with Docker and virtualisation technologies
  • Proficiency in managing monitoring stacks, including Prometheus and Grafana
  • Experience with CI/CD systems such as GitLab tools or Spinnaker
  • Knowledge of infrastructure-as-code frameworks, particularly Terraform
  • Experience managing databases such as PostgreSQL or equivalent relational database management systems
  • Experience with artifact repositories and Docker registries
  • Familiarity with cloud platforms (Google Cloud Platform, Amazon Web Services, or Microsoft Azure)
  • Understanding of distributed systems architecture and principles
  • Experience with performance tuning and system optimisation
  • Knowledge of security best practices in infrastructure and systems design
  • On-call support experience and comfort with incident response responsibilities
  • Arista stands out as an engineering-centric company. Our leadership, including founders and engineering managers, are all engineers who understand sound software engineering principles and the importance of doing things right.
  • We hire globally into our diverse team. At Arista, engineers have complete ownership of their projects. Our management structure is flat and streamlined, and software engineering is led by those who understand it best. We prioritize the development and utilization of test automat

Benefits

Vision insuranceRemote work options

Additional Information

Who You'll Work For We are seeking an experienced and analytically-minded Site Reliability Engineer to join our organisation on a permanent, remote basis from Ireland. In this role, you will be instrumental in building, deploying, and operating critical production systems with a steadfast commitment to scalability, reliability, observability, and security. You will work collaboratively with cross-functional teams to ensure our infrastructure remains resilient, efficient, and future-ready. This is an excellent opportunity for a detail-oriented professional who thrives in a dynamic environment and is passionate about solving complex infrastructure challenges.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Aristanetworks? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect