ITSM Incident & Problem Manager
ExternalFull-timeOn-site2w ago
DatadogDocumentationGrafanaLeadershipMoveObservability
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Serve as the Incident Manager / Major Incident Manager for high-severity and business-impacting incidents by organizing incident bridges and war rooms, driving Rapid triage / Clear ownership / Timely decision-making
- Ensure incidents are properly classified, prioritized, and escalated based on impact and urgency
- ITSM Process Ownership & Governance
- Enforce ITIL-aligned Incident and Problem Management practices
- Ensure accurate and complete documentation within ServiceNow, including Impact and affected services / Incident timelines / Root cause summaries and follow-ups
- Play the role of Problem Manager to Identify recurring issues and systemic risks / Ensure RCAs are completed with actionable outcomes
- Act as a process authority during incidents, ensuring teams adhere to defined ITSM standards
- Service Availability, Reliability & KPIs
- Own operational oversight of service availability and reliability - Monitor and manage key service health indicators, including Service availability and uptime / Incident volumes and severity trends / MTTR and MTTD / SLA and OLA adherence
- Use observability data to proactively identify service degradation and emerging risks
- Escalate systemic availability or reliability concerns to leadership with data-backed insights
- Observability & Operational Intelligence
- Actively leverage observability platforms (e.g., Grafana, Datadog)
- Partner with engineering and SRE teams to improve Monitoring coverage / Alert quality and signal-to-noise ratio
- Ensure alerting and escalation via PagerDuty aligns with service criticality
- Communication & Executive Engagement:
- Serve as the primary communication lead during incidents - Deliver concise, executive-level updates that articulate Business impact / Current status / Mitigation steps / Next milestones
- Translate complex technical details into clear business language
- Maintain confidence and composure while engaging senior leaders during high-pressure events
- Post-Incident & Continuous Improvement:
- Facilitate or support post-incident reviews - Identify trends, gaps, and opportunities for Process improvement / Tooling enhancement / Better operational readiness
- Contribute to the evolution of Command Center playbooks, runbooks, and response standards
- Required Qualifications & Experience
Requirements
- 3-6 years of experience in:
- Incident Management
- Major Incident / Command Center operations
- Production operations or site reliability support
- Proven experience managing high-severity incidents in 24×7 environments
- Demonstrated ownership of service reliability and operational KPIs
- ITSM & Process Expertise
- Strong working knowledge of ITIL / ITSM frameworks
- Deep hands-on experience with:
- Major Incident workflows
- Problem Management
- Experience enforcing ITSM discipline across distributed technology teams
- Skills & Competencies
- Exceptional communication and facilitation skills
- Strong analytical mindset with comfort using metrics and dashboards
- Ability to operate decisively in high-pressure situations
- Influences outcomes without formal authority
- Comfortable interfacing with executive leadership
- Experience in regulated or customer-critical environments (FinTech, Payments, SaaS)
- Exposure to ITSM tools like ServiceNow, PagerDuty etc.
- Exposure to monitoring tools like Datadog, Grafana, Dynatrace etc.
- About Convera
- Our teams care deeply about the value we bring to our customers, making Convera a rewarding workplace. This is an exciting time for our organization as we build our team with growth-minded, results-oriented people who are looking to move fast in an innovative environment.
- As a truly global company with employees in over 20 countries, we are passionate about diversity; we seek and celebrate people from different backgrounds, lifestyles, and unique points of view. We want to work with the best people and ensure we foster a culture of inclusion and belonging.
- We offer an abundance of competitive perks and benefits including:
- Market competitive salary.
- Great career growth and development opportunities in a global organization.
- Hybrid schedule with 2 in the office per week.
- Generous insurance (health, disability, life).
- Paid holidays, time-off, and leave policies for life events (maternity, paternity, adoption).
- Paid volunteering opportunities (5 days per year).
- This position follows a shift roster defined by the company. As we operate in a 24/7 environme
Benefits
Health insurance
Additional Information
Incident & Major Incident management
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at convera? Share your experience