Lead Site Reliability Engineer - eFinancialCareers

External

Efinancialcareers · London, UK

Full-timeOn-site1mo ago30+ days old, may be filled

AWSAzureBashCapacity PlanningCI/CDGCP

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

We are an energy trading company generating liquidity across global commodities markets. We combine deep trading expertise with proprietary technology and the power of data science to be the best-in-class. Our understanding of volatile, data-intensive markets is a key part of our edge. At Dare, you will be joining a team of ambitious individuals who challenge themselves and each other. We have a culture of empowering exceptional people to become the best version of themselves. As a Lead Site Reliability Engineer, you will play a critical role in ensuring the stability, scalability, and performance of mission-critical, low-latency trading platforms. You'll work closely with traders, quantitative analysts, and engineers in a fast-paced environment where precision and speed are essential. This role combines deep technical expertise with leadership responsibility. You will own the reliability strategy while remaining hands-on with in production systems and complex distributed architectures. You will define and drive reliability practices for latency-sensitive trading infrastructure, establish and enforce service level objectives, and lead incident response across live trading environments. You'll focus on optimising system performance and latency, while collaborating with stakeholders to balance reliability, execution speed, and operational risk. Shaping technical direction, you will actively contribute to debugging, automation, and system design, while mentoring engineers to build a high-performing and resilient engineering culture.

Responsibilities

Ensure real-time trading systems remain stable and performant, proactively monitoring, diagnosing, and resolving issues impacting trading or market connectivity.
Lead production incident response as the first line of defence, driving live troubleshooting, root-cause analysis, and long-term remediation.
Define and own reliability strategy performance including service level objectives, service level indicators, and error budgets for critical trading systems.
Collaborate with trading, engineering, and infrastructure teams on capacity planning, upgrades, and low/zero-downtime migrations.
Drive automation across operational workflows using Python, Bash, and SQL to reduce manual intervention.
Continuously optimise systems and networks, leveraging deep operating system, networking, and performance expertise.
Manage and mentor engineers across London and offshore teams, promoting engineering best practices.
Act as a senior escalation point during high-severity incidents.
Participate in and lead on-call rotations, including nights for ICE market opening hours.
Support releases, maintenance, and trading events outside standard hours including weekends.

Requirements

Extensive experience as a Site Reliability Engineer (SRE), DevOps or Production Support Engineering.
Experience within trading, hedge funds, or financial services, ideally close to front-office systems.
Strong understanding of low-latency, highly distributed trading systems.
Deep knowledge of cloud platforms (AWS, GCP, or Azure).
Deep expertise in Linux/UNIX environments and command-line tooling.
Advanced understanding of application-level networking (TCP/IP, UDP).
Strong programming/scripting skills (Python, Bash) with SQL proficiency.
Experience with CI/CD pipelines and infrastructure-as-code (Terraform, Kubernetes).
Proven experience in incident management, root-cause analysis, and system optimisation.
Experience managing large-scale infrastructure, including capacity planning and migrations.
Ability to leverage AI to develop and deliver solutions and rapid velocity.
Desirable
Experience in market-making environment.
Strong operating system level performance tuning expertise.
Exposure to exchange connectivity and market data systems.
Understanding of financial markets and trading workflows.
Benefits & perks
Competitive salary
Vitality health insurance and dental cover
38 days of holiday (including bank holidays)
Pension scheme
Annual Bluecrest health checks
A personal learning & development budget of £5000
Free gym membership
Specsavers vouchers
Enhanced family leave
Cycle to Work scheme
Credited Deliveroo dinner account
Office massage therapy
Freshly served office breakfast twice a week
Fully stocked fridge and pantry
Social events and a games room
Diversity matters
Please let us know ahead of the interview and testing processes if you require any reasonable adjustments or assistance during the application process.
We're also proud to be certified a 'Great Place to Work'.

Additional Information

City of London Permanent, On-site Full-time

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at eFinancialCareers? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect