IT Event Management Engineer
ExternalFull-timeHybrid1w ago
AnsibleAWSAzureClassificationConfluenceDocumentation
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- The role is foundational to achieving a proactive, data-driven IT operations model -helping transition the organization from reactive firefighting to predictive service assurance.
- Core Competencies
- Technical Proficiency: Understanding of ITOM systems, monitoring frameworks, APIs, and integrations.
- Process Orientation: Ability to document, standardize, and institutionalize best practices through templates and guides.
- Collaboration: Works effectively across infrastructure, application, and service management domains.
- Analytical Thinking: Strong data interpretation skills to identify event trends and patterns.
- Innovation: Drives automation, simplification, and modernization of event workflows.
- Customer Focus: Designs alerting systems that directly enhance reliability, transparency, and user satisfaction.
- Qualifications & Experience
- Education & Certifications
- Bachelor's degree in Information Technology, Computer Science, Engineering, or related field
- ITIL Foundation (minimum); ITIL Intermediate or ITIL 4 Managing Professional is an advantage
- Relevant certifications in cloud platforms (Azure, AWS) or monitoring tools are desirable
Requirements
- 5+ years' experience in IT Operations, Monitoring, or Event Management roles
- Proven experience working with enterprise monitoring and event management tools (e.g., OpsBridge, Azure Monitor, AWS CloudWatch, Site24x7)
- Experience integrating monitoring tools with ITSM platforms for automated incident management
- Hands-on experience in automation and scripting (e.g., PowerShell, Python, Ansible)
- Exposure to hybrid environments (cloud + on-prem infrastructure)
- Technical & Functional Skills
- Strong understanding of ITOM and monitoring frameworks, including event correlation, alert tuning, and noise reduction
- Experience designing or operating event-driven systems for proactive detection and service assurance
- Knowledge of APIs and system integrations
- Experience with dashboards, analytics, and reporting tools (e.g., Power BI, Kibana)
- Familiarity with ITSM processes (Incident, Problem, Change Management)
- Core Responsibilities
- Event Detection, Ingestion & Correlation
- Design and maintain event ingestion pipelines from multiple monitoring sources (e.g., Azure Monitor, AWS CloudWatch, network devices, applications, SaaS systems).
- Develop correlation logic and rules to identify related alerts and minimize redundant or noisy notifications.
- Maintain event taxonomies and classification standards to ensure consistent event tagging, severity, and categorization across systems.
- Automation, Orchestration & Remediation
- Build and maintain automation scripts and workflows to automatically detect and remediate known issues (e.g., restarting services, clearing caches, resizing disks).
- Integrate event management systems with ITSM platforms (e.g., ServiceNow, SMAX) to auto-create and route incidents with contextual data.
- Participate in AIOps initiatives-leveraging predictive analytics and machine learning models to forecast incidents and anomalies.
- Standardization, Templates & Documentation
- Develop standard operating procedures (SOPs) , runbooks , and knowledge articles for consistent event triage and escalation processes.
- Create event configuration templates (e.g., for threshold settings, escalation rules, integration blueprints) to ensure monitoring practices are repeatable and scalable .
- Maintain a Monitoring and Event Management Playbook outlining governance, workflows, and automation frameworks.
- Document integration patterns, naming conventions, and API schema mappings to enable faster onboarding of new systems.
- Ensure all documentation is version-controlled, accessible via Confluence or SharePoint, and updated as systems evolve.
- Operational Effectiveness & Continuous Improvement
- Conduct routine health checks on event management systems to ensure optimal performance, data accuracy, and integration stability.
- Analyze event and incident data trends to identify gaps, redundancies, or opportunities for improvement.
- Partner with Service Desk, Cloud, and Network teams to optimize event thresholds, escalat
Benefits
Health insurance
Additional Information
With 75 years of experience, our focus is on helping the most vulnerable children overcome poverty and experience fullness of life. We help children of all backgrounds, even in the most dangerous places, inspired by our Christian faith. Come join our 31,000+ staff working in nearly 100 countries and share the joy of transforming vulnerable children's life stories!
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at World Vision? Share your experience