Director, Site Reliability Engineer - Senior Engineering Team Director

External

Blackrock · Edinburgh, UK

Full-timeHybridToday

AgileApacheCapacity PlanningCassandraForecastingLeadership

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Requirements

B.S. / M.S. degree in Computer Science, Engineering or a related discipline with 10+ years of experience
Experience leading high performing engineering/SRE teams, with a track record of driving continuous improvement through automation and AI-enabled operations
Demonstrated ability to represent engineering/SRE priorities, status, and risk to senior leadership stakeholders with clear, executive-ready communication
Hands-on experience building or operating AI-assisted capabilities (AIOps, ML-based anomaly detection, or GenAI workflows) in an engineering/production environment
A passion for providing engineering support for highly available , performant full stack applications with a "Student of Technology" attitude
Experience with relational database and NoSQL Database ( e.g. Redis, Apache Cassandra)

Benefits

Our hybrid work modelHealth insuranceVision insuranceFlexible schedule

Additional Information

About this role Role Overview: We're seeking a Site Reliability Engineering (SRE) Lead to design, build, and maintain resilient, high-scale systems supporting BlackRock's Private Markets platform. In this hands-on leadership role, you'll apply deep engineering expertise to solve complex challenges, guide a global team, shape technical direction, and communicate effectively with senior stakeholders-ensuring the reliability of mission-critical systems that power private market investment workflows and decision-making. You will drive the adoption of AI-driven solutions to accelerate incident detection and triage, reduce toil, improve forecasting and capacity planning, and strengthen end-to-end observability and resilience. Role Responsibilit ies Take ownership of project priorities, deadlines and deliverables using Agile methodologies, with clear outcomes around reliability automation and AI-enabled operations Understand and refine business and functional requirements, translating them into SLOs/SLIs and AI-assisted observability and support capabilities Hands on approach to getting work done-this role requires a "roll your sleeves up" mentality, including building and operationalizing reliability tooling and automation that measurably reduces toil and improves stability Be a leader with vision and a partner in brainstorming solutions for team productivity and efficiency to improve engineering effectiveness Drive priority setting of the engineering teams, balancing foundational reliability work with delivery of new product features Improve Engineering culture by encouraging continuous focus on reliability across the entire application lifecycle, and by adopting AI-enabled SRE practices (e.g., intelligent alerting, automated diagnosis, and self-healing where appropriate) Proactive participant in architectural and design decisions, including AI-ready telemetry, data quality, and model integration patterns for operational analytics Design and implement end-to-end monitoring solutions for application and infrastructure components, leveraging modern observability platforms plus AI/ML techniques for anomaly detection, correlation, and alert noise reduction Drive the engineering of capacity management and demand forecasting solutions, including predictive analytics/ML approaches where they add measurable value Act as a culture carrier and leader , passing on SRE knowledge and best practices to the engineering team Drive detailed root cause investigations for production incidents with rigorous focus on issue avoidance, using AI-assisted correlation/analysis to accelerate time-to-insight Create/coordinate retros for significant incidents, ensuring learnings are captured in automated/AI-assisted runbooks and embedded into prevention mechanisms Additional core engineering functions, such as adding custom telemetry metrics/logs/traces to the code base of in-scope applications to enable AI/ML-driven operational insights Anticipate new opportunities to continuously evolve the resiliency profile of scoped applications and infrastructure Skills/Qualifications

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at BlackRock? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect