ITSM Incident & Problem Manager
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Incident & Major Incident management
- Serve as the Incident Manager / Major Incident Manager for high-severity and business-impacting incidents by organizing incident bridges and war rooms, driving Rapid triage / Clear ownership / Timely decision-making.
- Ensure incidents are properly classified, prioritized, and escalated based on impact and urgency.
- ITSM Process Ownership & Governance
- Enforce ITIL-aligned Incident and Problem Management practices
- Ensure accurate and complete documentation within ServiceNow, including Impact and affected services / Incident timelines / Root cause summaries and follow-ups
- Play the role of Problem Manager to Identify recurring issues and systemic risks / Ensure RCAs are completed with actionable outcomes
- Act as a process authority during incidents, ensuring teams adhere to defined ITSM standards
- Service Availability, Reliability & KPIs
- Own operational oversight of service availability and reliability - Monitor and manage key service health indicators, including Service availability and uptime / Incident volumes and severity trends / MTTR and MTTD / SLA and OLA adherence
- Use observability data to proactively identify service degradation and emerging risks
- Escalate systemic availability or reliability concerns to leadership with data-backed insights
- Observability & Operational Intelligence
- Actively leverage observability platforms (e.g., Grafana, Datadog)
- Partner with engineering and SRE teams to improve Monitoring coverage / Alert quality and signal-to-noise ratio
- Ensure alerting and escalation via PagerDuty aligns with service criticality.
- Communication & Executive Engagement
- Serve as the primary communication lead during incidents - Deliver concise, executive-level updates that articulate Business impact / Current status / Mitigation steps / Next milestones
- Translate complex technical details into clear business language
- Maintain confidence and composure while engaging senior leaders during high-pressure events
- Post-Incident & Continuous Improvement
- Facilitate or support post-incident reviews - Identify trends, gaps, and opportunities for Process improvement / Tooling enhancement / Better operational readiness
- Contribute to the evolution of Command Center playbooks, runbooks, and response standards
- Required Qualifications & Experience:
- 3-6 years of experience in:
- Incident Management
- Major Incident / Command Center operations
- Production operations or site reliability support
- Proven experience managing high-severity incidents in 24×7 environments
- Demonstrated ownership of service reliability and operational KPIs
- ITSM & Process Expertise
- Strong working knowledge of ITIL / ITSM frameworks
- Deep hands-on experience with:
- Incident Management
- Major Incident workflows
- Problem Management
- Experience enforcing ITSM discipline across distributed technology teams
- Skills & Competencies
- Exceptional communication and facilitation skills
- Strong analytical mindset with comfort using metrics and dashboards
- Ability to operate decisively in high-pressure situations
- Influences outcomes without formal authority
- Comfortable interfacing with executive leadership
Requirements
- Experience in regulated or customer-critical environments (FinTech, Payments, SaaS)
- Exposure to ITSM tools like ServiceNow, PagerDuty etc.
- Exposure to monitoring tools like Datadog, Grafana, Dynatrace etc.
- About Convera
Benefits
Additional Information
Tech Operations (ITIL) Analyst at Convera Pune- WFH We are establishing a Global Command Center responsible for 24×7 operational oversight, service availability, and incident leadership across the organization. The Incident Manager plays a critical role in protecting service reliability, enforcing ITSM discipline, and ensuring consistent operational outcomes. This role goes beyond reactive incident handling. The Incident Manager is accountable for service health, availability, reliability KPIs, and the end-to-end execution of Incident and Problem Management practices, while serving as the primary interface between technology teams, business stakeholders, and executive leadership during incidents. Core Objectives of the Role: Maintain high service availability and reliability across customer-facing and critical internal services. Lead major incidents with clarity, structure, and urgency Ensure consistent execution of ITSM processes, particularly Incident and Problem Management Provide real-time operational visibility and executive-ready communications Drive continuous improvement in operational KPIs and service stability
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at convera? Share your experience