DevOps Engineer - Engineering Team Manager , Vice President

External

Blackrock · Mumbai, India

Full-timeHybrid4d ago

Capacity PlanningForecastingObservabilityRoutingSAFe

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Own reliability outcomes
Deliver measurable improvements in availability, latency, recovery time, and incident recurrence for critical workloads.
Establish and mature SRE practices : SLIs/SLOs, error budgets, and reliability governance.
Build AI‑enabled, signal‑driven operations
Replace alert volume with AI‑correlated signals that prioritize true business impact and early risk indicators.
Implement and improve detection, correlation, and routing workflows integrated into operational processes and tooling.
Engineer self‑healing systems (automation-first)
Design, implement, and govern automated remediation for known failure patterns (with safe guardrails and audit trails).
Maintain structured human oversight for novel scenarios and ensure continuous learning feeds back into automation.
Embed reliability and operability into engineering lifecycle
Partner with Engineering/Architecture/Product to build operability, observability, resilience-by-design into services.
Drive root-cause elimination and reduce recurrence through systemic fixes, not repetitive recovery.
Change-aware resilience + risk posture
Own change-aware operations: use AI risk signals to anticipate failures based on historical data, dependency graphs, and weak points.
Support production readiness: capacity planning, disaster recovery exercises, and disciplined change governance.
Evidence, auditability, and resilience expectations
Ensure AI-driven systems are observable, explainable, and auditable , meeting operational and regulatory expectations.
Develop and lead high performing global team, fostering strong ownership, technical depth, and a culture of accountability and continuous improvement.
Qualifications / Competencies
Bachelor's degree in computer science /engineering (or equivalent practical experience).
10+ years across Service Management, DevOps, SRE, Product Engineering, and/or large-scale production operations.
Strong hands-on experience with observability/monitoring/telemetry platforms, focused on actionable insights and reliability outcomes.
Proven experience transitioning environments from reactive support to proactive, signal-driven / AI-assisted operations .
Designed/tuned/governed automation and AIOps workflows , enabling automated remediation while retaining structured human oversight for exceptions.
Experience implementing change-aware operations, drift detection/correction, and data-driven reliability governance to reduce incident recurrence.
AI-assisted capacity forecasting and proactive scaling for performance predictability and cost efficiency.
End-to-end operational fluency: telemetry → ITSM integration → automated execution.
Experience sponsoring or governing AI-assisted/autonomous operational platforms.

Benefits

To help you stay energized, engaged and inspired, we offer a wide range of benefits including a strong retirement plan,Flexible schedule

Additional Information

About this role You can work with us at one of top FinTech companies. We sell our Aladdin platform to over 200 of the top global corporations, in total managing about quarter of all the world's money under management. BlackRock is global but close-knit team of individuals who share a common goal of providing the very best possible level of support to our business partners and customers. From the top of the firm down, we embrace the diversity of values, identities and ideas brought by our employees. We are serious about our people and offer Flexible Time Off, collaborative working spaces and several other benefits. An individual selected in this position will have the responsibility to cover business-critical computing workloads, real-time / interactive processing, data transfer services, application and new technology on-boarding and upgrades, and recovery procedures. The international team is split into 4 global regions to provide 24*7*365 support. This role is accountable for enterprise‑scale reliability, risk posture, and operational resilience through the adoption of AI‑enabled, autonomous infrastructure and service operations. The leader transitions the organization from human‑driven, reactive operations to AI‑supervised, self‑healing systems, ensuring technology platforms scale safely, predictably, and in alignment with business growth and regulatory expectations. The role owns strategy, outcomes, and governanceIf any of these excites you, we want to talk to you. Team Overview The Service Management Operations Group is responsible for monitoring, supporting, and administering production environments for all BlackRock businesses (including subsidiaries and BlackRock Solutions) acting as a first responder relative to troubleshooting, problem resolution, and escalation. Collaborating with skilled professionals across the globe and managing a broad range of technologies and applications, the Operations Group delivers service quality and excellence through teamwork, innovating operational processes, and being part of the One BlackRock culture.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at BlackRock? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect