Vice President, Site Reliability Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Act as a C lient-facing reliability partner, providing a clear point of coordination during incidents, escalations, onboarding, and major operational events
- Assist with incident management, including technical coordination, issue narrative, stakeholder communication, and follow-through to resolution
- Partner closely with Technology Client Experience, engineering, and platform teams to ensure reliability issues are understood, owned, and driven to closure end to end
- Proactively support onboarding and operational readiness for top-tier clients by identifying systemic risks, validating supportability, and ensuring operational standards are met before scale
- Translate recurring C lient pain points, escalation themes, and onboarding learnings into actionable systemic reliability improvements across products and platforms
- Shift reliability left by engaging early in new Client onboarding, change planning, and design discussions to proactively surface risk s
- Help navigate the organization to unblock remediation actions, align stakeholders, and accelerate resolution of high-priority client reliability issues
- Improve engineering culture by reinforcing a deliberate, consistent, and non-reactive approach to client reliability partnership
- Contribute to architectural, operational readiness, and observability discussions with a focus on client impact, resilience, and supportability
- Design and improve monitoring, telemetry, and operational visibility for client-critical workflows and journeys
- Drive detailed root cause investigations for significant client-impacting incidents, with strong focus on prevention and issue avoidance
- Create and coordinate retrospectives for key incidents and onboarding events, ensuring learnings are captured and translated into concrete follow-up actions
- Anticipate opportunities to strengthen the resiliency profile of systems and workflows most important to priority clients
- Act as a culture carrier for SRE principles, helping teams connect engineering decisions to real client experience and trust
- Skills/Qualifications
Requirements
- B.S. / M.S. degree in Computer Science, Engineering or a related discipline with 5 - 8 years of experience
- Strong experience in Site Reliability Engineering, production engineering, or a related reliability-focused role supporting critical systems
- Demonstrated ability to manage complex incident escalations and coordinate effectively across engineering, product, operations, and stakeholder groups
- Strong communication skills, including the ability to translate technical issues into clear, credible narratives for senior stakeholders and client-facing partners
- Experience driving operational readiness, onboarding readiness, or production supportability reviews for high-scale systems or strategic initiatives
- Strong troubleshooting and problem-solving skills, with the ability to identify both immediate remediation paths and underlying systemic issues
- Passion for improving the reliability, resilience, and supportability of highly available systems
- Experience with observability, monitoring, and telemetry tools used to detect, diagnose, and prevent incidents
- Ability to build strong cross-functional relationships and influence outcomes without direct authority
- Self-motivated, highly accountable, and comfortable operating in ambiguous, fast-moving environments
- Knowledge of software development methodologies, release processes, and operational support models
- Strong analytical thinking and a bias toward proactive risk
Additional Information
About this role BlackRock Company Overview: BlackRock is a global leader in investment management, risk management, and advisory services for institutional and retail clients. We help clients achieve their goals and overcome challenges with a range of products, including separate accounts, mutual funds, iShares® (exchange-traded funds), and other pooled investment vehicles. We also offer risk management, advisory, and enterprise investment system services to a broad base of institutional investors through BlackRock Solutions®. Headquartered in New York City, as of February 5, 2025, we handle approximately $11.5 trillion in assets under management (AUM) and have around 19,000 employees in offices across 38 countries, with a significant presence in key global markets, including North and South America, Europe, Asia, Australia, the Middle East, and Africa. Role Overview: We're seeking a Site Reliability Engineer (SRE) for a new Client Services-focused role that combines deep reliability engineering with strong C lient partnership. This role sits closely aligned with our Technology Client Experience team and complements our embedded SRE model by providing focused reliability engagement for priority C lients. You will act as a client-facing reliability partner - helping manage escalations, improve onboarding readiness, surface systemic risks, and translate client pain points into durable engineering improvements .
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at BlackRock? Share your experience