Site Reliability Engineer

External

Robin-ai · Cape Town, South Africa

Full-timeRemote10mo ago

AWSDatadogIncident ResponseObservabilityOpenTelemetryPython

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Benefits

Salary : CompetitiveEquity package: Generous equity scheme - everyone gets to be an owner of Robin AI!Annual leave: 20 days PTO, in addition to the public holidays observed in South Africa.Growth opportunities: We prioritise promotions for high performers and help you to progress your career.What's it like working at Robin?Diversity, Equity and Inclusion at RobinRobin operates a direct hiring model and any speculative CVs shared via agencies will be treated as a gift.Paid time offEquity / stock options

Additional Information

About Robin Robin is on a mission to rebuild the legal industry - starting with making contracts simple for everyone. We are a pioneer in Legal AI, built on proprietary models, licensed data, and deep partnerships with Anthropic and AWS. Since 2019, we've expanded our footprint to 4 continents and have been supporting many of the world's most successful businesses, including GE, Pfizer, KPMG, and UBS. What will you do as an SRE? As an SRE at Robin AI, you'll help build and maintain our cloud infrastructure and applications that powers our cutting-edge Legal AI platform. You'll collaborate with engineering teams to establish robust monitoring, incident response, and deployment strategies that ensure high availability and reliability of our proprietary models and services, maintaining optimal SLOs for our global customer base. Your day-to-day responsibilities: You will be responsible for ensuring the Robin systems are highly available and scalable. Standardise and implement observability practices in our service-based architecture through logging, traces, metrics and monitors Design, deploy, and operate infrastructure to support Robin's product teams as we expand into new regions. Adding automation around manual operational tasks Collaborate with development team leads to optimise build, test, and deployment processes Participating in and improving our on-call and incident handling processes to ensure 24/7 system reliability Ideally, you should have the following qualifications: 3+ years of experience in DevOps or Site Reliability Engineering roles Proficiency in at least one backend programming language (We use Python) Strong knowledge of AWS services (ECS, S3, RDS, Lambda, etc.), managed by Terraform Comfortable troubleshooting across the full stack, starting from the browser, through the networking components, into the containerised applications and then onto data stores. Knowledge of observability frameworks and tools (We use OpenTelemetry, Cloudwatch & DataDog) Excellent problem-solving and communication skills Experience with AI/ML infrastructure deployments is a plus

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at robin-ai? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect