Lead the global Incident Response (IR) program, optimizing processes across detection, triage, containment, remediation, and post-incident analysis.
Hire, mentor and train global team members on incident response best practices and observability tooling .
Serve as a technical lead and head engineer for creation and management of monitoring tools and services to support F5 infrastructure and business systems.
Serve as the primary incident commander during major incidents , ensuring timely resolution , excellent communication, and stakeholder alignment.
Define and continuously refine incid ent response policies, procedures, and runbooks to ensure consistent an d effectiv e handling of incidents.
Drive improvements in detection, escalation, and resolution through automation, tooling, and process enhancements.
Define and report KPIs for service reliability, incident response, and observability maturity to senior leadership.
Conduct root cause analyses and lead post-incident reviews to identify lessons learned and prevent recurrence.
Design and lead cross-functional tabletop exercises to strengthen organizational preparedness, communication, and response coordination during major incidents.
Maintain detailed incident records and metrics to support auditing, compliance, and continuous improvement.
Collaborate with ServiceNow team s and architects to manage incidents.
Establish an d maintain on-call rotation s with teams who own critical applications across the Digital organization .
Establish and lead Problem Management, Change Management, and Configuration Management functions to improve operational excellence across Digital/IT.
Requirements
10+ years managing incident response within NOC/SOC/SRE teams with a focus on monitoring and observability .
Proven track record of managing complex operational incidents in cloud and hybrid environments.
Experience driving continuous improvement and operational excellence in processes such as Problem Management, Change Management , and Configuration Management.
Experience working with and/or managing CMDB gover nance leveraging and ITSM platform (e.g., ServiceNow)
Experience integrating runbooks, operational processes, and metrics reporting into an ITSM platform (e.g., ServiceNow)
Experience with observability tools, especially tooling focused on synthetics , metrics, and infrastructure telemetry ( e.g. Grafana, ThousandEyes , LogicMonitor , Pingdom, Zabbix)
Excellent communication skills with the ability to convey technical information to both technical and non-technical audiences.
Ability to lead u
Benefits
Vision insurance
Additional Information
At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation.
Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive.
Manager, ITSM and Observability
About F5
At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation.
Everything we do c enters around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive.
Position Summary
We are seeking a manager to help build our new Site Reliability Engineering and ITSM team to strengthen operational excellence across the Infrastructure & Security and F5 Digital organization. This role will play an important part in Digital's incident management strategy and IT Service Management practices by building out the Reliability Operations Center and monitoring capabilities required to help Digital understand problems before our users do.
The ideal candidate will bring deep expertise in incident lifecycle management-from detection a nd triage to resolution and post-mort em-and will collaborate cross-functionally to drive continuous improvement in our security posture. This leader will operationalize a world-class incident management program while also defining and implementing the vision for observability across F5's hybrid infrastructure and cloud environments. In addition, this role will be responsible for maturing related ITSM processes including Problem Management, Change Management, and Configuration Management. This role requires strong leadership, technical acumen, and the ability to operat e under pressure while maintainin g clear communication with stakeholders at all levels.