ITSM Incident & Problem Manager

External

Convera · Pune, India

Full-timeOn-site2w ago

DatadogDocumentationGrafanaLeadershipObservabilityProcess Improvement

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Incident & Major Incident management
Serve as the Incident Manager / Major Incident Manager for high-severity and business-impacting incidents by organizing incident bridges and war rooms, driving Rapid triage / Clear ownership / Timely decision-making.
Ensure incidents are properly classified, prioritized, and escalated based on impact and urgency.
ITSM Process Ownership & Governance
Enforce ITIL-aligned Incident and Problem Management practices
Ensure accurate and complete documentation within ServiceNow, including Impact and affected services / Incident timelines / Root cause summaries and follow-ups
Play the role of Problem Manager to Identify recurring issues and systemic risks / Ensure RCAs are completed with actionable outcomes
Act as a process authority during incidents, ensuring teams adhere to defined ITSM standards
Service Availability, Reliability & KPIs
Own operational oversight of service availability and reliability - Monitor and manage key service health indicators, including Service availability and uptime / Incident volumes and severity trends / MTTR and MTTD / SLA and OLA adherence
Use observability data to proactively identify service degradation and emerging risks
Escalate systemic availability or reliability concerns to leadership with data-backed insights
Observability & Operational Intelligence
Actively leverage observability platforms (e.g., Grafana, Datadog)
Partner with engineering and SRE teams to improve Monitoring coverage / Alert quality and signal-to-noise ratio
Ensure alerting and escalation via PagerDuty aligns with service criticality.
Communication & Executive Engagement
Serve as the primary communication lead during incidents - Deliver concise, executive-level updates that articulate Business impact / Current status / Mitigation steps / Next milestones
Translate complex technical details into clear business language
Maintain confidence and composure while engaging senior leaders during high-pressure events
Post-Incident & Continuous Improvement
Facilitate or support post-incident reviews - Identify trends, gaps, and opportunities for Process improvement / Tooling enhancement / Better operational readiness
Contribute to the evolution of Command Center playbooks, runbooks, and response standards
Required Qualifications & Experience:
3-6 years of experience in:
Incident Management
Major Incident / Command Center operations
Production operations or site reliability support
Proven experience managing high-severity incidents in 24×7 environments
Demonstrated ownership of service reliability and operational KPIs
ITSM & Process Expertise
Strong working knowledge of ITIL / ITSM frameworks
Deep hands-on experience with:
Incident Management
Major Incident workflows
Problem Management
Experience enforcing ITSM discipline across distributed technology teams
Skills & Competencies
Exceptional communication and facilitation skills
Strong analytical mindset with comfort using metrics and dashboards
Ability to operate decisively in high-pressure situations
Influences outcomes without formal authority
Comfortable interfacing with executive leadership

Requirements

Experience in regulated or customer-critical environments (FinTech, Payments, SaaS)
Exposure to ITSM tools like ServiceNow, PagerDuty etc.
Exposure to monitoring tools like Datadog, Grafana, Dynatrace etc.
About Convera

Benefits

Health insurance

Additional Information

Tech Operations (ITIL) Analyst at Convera Pune- WFH We are establishing a Global Command Center responsible for 24×7 operational oversight, service availability, and incident leadership across the organization. The Incident Manager plays a critical role in protecting service reliability, enforcing ITSM discipline, and ensuring consistent operational outcomes. This role goes beyond reactive incident handling. The Incident Manager is accountable for service health, availability, reliability KPIs, and the end-to-end execution of Incident and Problem Management practices, while serving as the primary interface between technology teams, business stakeholders, and executive leadership during incidents. Core Objectives of the Role: Maintain high service availability and reliability across customer-facing and critical internal services. Lead major incidents with clarity, structure, and urgency Ensure consistent execution of ITSM processes, particularly Incident and Problem Management Provide real-time operational visibility and executive-ready communications Drive continuous improvement in operational KPIs and service stability

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at convera? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect