Incident & Problem Manager

External

Blackrock · Edinburgh, UK

Full-timeHybridToday

AnsibleAWSAzureData AnalysisExcelGDPR

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

About this role About the BlackRock Returner Programme The Returner Programme is designed to encourage and support a return to full-time work through a six-month paid internship starting in September 2025. The returner programme acts as a feeder to full-time permanent employment (subject to individual performance and market conditions). How we will support you: They will offer a variety of employee networks to help you integrate into the business and culture Provide a mentor and buddy for your day-to-day questions Provide coaching and peer support To qualify for the program, participants must have the following: A career break of at least 24 months (no upper limit). During your career break you may have been working small-scale around other commitments or have not done any paid work. Prior experience in Investment Management or Financial Services Must be authorised to work in the UK Team Overview The Service Management team provides industry ‑ standard Incident, Problem and Change Management, alongside infrastructure operational support for Aladdin. We operate using modern engineering practices and tooling, including ServiceNow and AI ‑ enabled workflows, and measure outcomes through clear operational metrics. BlackRock operates a 24/7 Major Incident Management function supporting global clients across Europe, the Americas, Asia Pacific and India. This role is based in Edinburgh and is required to cover core European hours between 09:00 and 1 7 :00 . Role We are seeking an individual for a role of Problem Manager with a strong passion for technical troubleshooting . This role exists to prevent the recurrence of incidents. A Problem Manager is accountable for identifying , owning, and eliminating the root causes of incidents to prevent recurrence and improve service stability The role partners closely with Engineering and SRE / DevOps teams, leveraging automation, observability tooling and emerging AI ‑ driven insights. The successful candidate will have a DevOps mindset, be able to actively troubleshoot, and utilise and enhance AI and automation. The role also includes participation in continuous improvement initiatives aimed at improving the stability, performance and resilience of the Aladdin platform, and enhancing Service Management services. K ey Responsibilities 1. Manage a robust Problem Management process to prevent incident recurrence Ensure root causes and preventative actions are clearly captured and translated into an effective Problem Management process . Identify incident trends and repeat patterns, driving scalable remediation to reduce recurrence Partner with Engineering and SRE / DevOps to embed learnings into automation, observability, runbooks and readiness controls . Design, build and actively maintain a Known Error Database that functions as a real ‑ time operational asset . Work with product teams to design, build and deliver a meaningful process for addressing repeat incidents . 2 . Deliver executive ‑ grade communications (AI ‑ drafted, human ‑ approved) Validate, approve and issue regular communications that are concise, informative and appropriate for stakeholders . Ensure communications accurately reflect impact, mitigation progress, key risks and confidence ‑ based ETAs . 3 . Drive continuous service improvement and regulatory alignment Drive process and tooling changes that support operational resilience and regulatory requirements, including DORA and GDPR, where applicable . Provide input and ownership for continual service improvement initiatives, with a primary focus on Agentic AI and its application to Incident Management . 4. Creation of Reports to communicate Problem progress Desirable Experience and Capabilities E xperience in Incident / Problem Management within a production environment supporting business ‑ critical platforms . Proven ability to lead multiple simultaneous problems and drive structured recovery under pressure . DevOps mindset, with comfort using observability tooling, automation and operational engineering practices . Ability to produce clear, high ‑ quality communications suitable for senior stakeholders . Experience operating AI systems for triage, correlation and narrative generation, with sound judgement on when outputs require validation or override . Ability to translate repetitive incident activity into automation requirements and drive adoption with engineering partners . Experience working in or with FinTech or regulated environments . Knowledge of cloud platforms such as Azure and/or AWS and understanding of IaaS / PaaS / SaaS service models . Experience with Microsoft Copilot and AI ‑ enabled productivity tooling . Programming capability (e.g. Python) to automate common tasks or prototype improvements . Familiarity with configuration management, deployment and orchestration tooling (e.g. Ansible) . Strong data analysis skills using tools such as Splunk, Grafana, Tableau, Excel and/or Power BI . Strong expe

Incident & Problem Manager

About the role

Your Match

Company Intel

What employees say

Interested in this role?