Taking an engineering approach in leading technical initiatives for automating system engineering efforts to guarantee the reliability of the global Elastic infrastructure. .
Growing our global Platform infrastructure to meet the increasing scaling demands by developing and maintaining software, tooling and automations.
Collaborating in an environment with an inclusive approach, and focusing on operational excellence, and uplifting others.
Responding to and preventing repeated customer impact in response to major incidents and prioritised problem management. Our on call rotation uses follow-the-sun model where everyone participates in it in (mostly) their working hours.
What you bring
Success and lessons of experiences from striving for 'progress not perfection' in the name of Platform reliability. We want to hear about your customer first approach in solving operational problems with a SRE perspective.
A background in software engineering to collaborate with engineers to expertly identify, implement and deliver solutions ideally using Golang.
Production experience in Public Cloud Service Providers and managing Kubernetes infrastructure at scale
Passion for developing solutions that involve inclusive communication methods to grow and strengthen partner and team relationships. Examples of working in distributed teams or working remotely is desirable.
Bonus Points
You don't need to have all of these items, but these represent the types of work you will do as a Site Reliability Engineer at Elastic.
You have operated a SaaS product in a public cloud ideally built using Infrastructure-as-Code tooling such as Crossplane or Terraform
You have built or operated a Kubernetes-at-scale infrastructure, ideally across multiple cloud providers, and the vital automation to support it.
You have worked with containerized services (such as Docker.)
You have proven experience in leading and improving alerting and major incident management standard processes metrics systems (e.g. Elastic Stack, Prometheus, Influx) to diagnose issues and quantify impacts to present to others at varying level of the organization.
You have experience in system administration with professional skills in Linux on distributed systems at scale.
You have diagnosed or designed, implemented and created solutions with the Elastic Stack.
You are experienced in thriving in a self-organizing and sharing in a globally distributed team environment.
You strengthen team members in bringing out the best of each other by uplifting others with coaching and mentoring.
Compensation for this role is in the form of base salary. This role does not have a variable compensation component. The typical starting salary range for new hires in this role is listed below.
These ranges represent the lowest to highest salary we reasonably and in good faith believe we would pay for this role at the time of this posting. We may ultimately pay more or less than the posted range, and the ranges may be modified in the future.
An employee's position within the salary range will be based on several factors including, but not limited to, relevant education, qualifications, certifications, experience, skills, geographic location, performance, and business or organizational needs.
The typical starting salary range for this role is:
$148,300 -
Benefits
Remote work optionsEquity / stock optionsPerformance bonus
Additional Information
Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale - unleashing the potential of businesses and people. The Elastic Search AI Platform, used by more than 50% of the Fortune 500, brings together the precision of search and the intelligence of AI to enable everyone to accelerate the results that matter. By taking advantage of all structured and unstructured data - securing and protecting private information more effectively - Elastic's complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI.
What is The Role
As part of the Platform Engineering department, the SRE team is designing, building, scaling and maturing the multi-cloud platform for hosting internal and external services such as the Elastic Cloud Hosted and Serverless . We develop and extend new software and tools that support the rest of the infrastructure, so that we can rapidly deploy products from all corners of Elastic. We want your experience and recommendations to offer a truly exceptional customer experience!