IT Operations Analyst
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
You are as unique as your background, experience and point of view. Here, you'll be encouraged, empowered and challenged to be your best self. You'll work with dynamic colleagues - experts in their fields - who are eager to share their knowledge with you. Your leaders will inspire and help you reach your potential and soar to new heights. Every day, you'll have new and exciting opportunities to make life brighter for our Clients - who are at the heart of everything we do. Discover how you can make a difference in the lives of individuals, families and communities around the world. Job Description: Job description (heading) / Description du poste (titre) This role combines Network Operations Center (NOC) monitoring, data analytics, and Site Reliability Engineering (SRE) to ensure the availability, performance, and reliability of enterprise systems. The position focuses on real-time monitoring, alert/event management, observability (including OpenTelemetry), and data-driven reporting , while continuously improving system reliability, alert quality, and operational efficiency. The role bridges: Operations (incident monitoring and response) Analytics (dashboarding and reporting) Engineering (SRE, observability, and automation) Preferred skills (heading) / Compétences particulières (titre ) Monitoring & Observability Experience with ScienceLogic, Splunk, ServiceNow ITOM, or similar tools Strong understanding of observability concepts (metrics, logs, traces) Hands-on or working knowledge of OpenTelemetry (instrumentation and telemetry pipelines) Familiarity with event correlation and alert tuning Data & Analytics Strong SQL / PL-SQL data extraction and manipulation Dashboarding and visualization using Tableau / ServiceNow Operations & SRE Knowledge of ITIL incident and event management processes Experience with high-volume alert environments Strong analytical thinking and root cause analysis Core Technical Understanding of infrastructure, applications, and distributed systems Exposure to automation or scripting (nice-to-have) Qualifications (heading) / Compétences (titre) Bachelor's degree in Computer Science, Information Technology, or related field (or equivalent experience) Proven experience in: NOC / Production Support / Monitoring roles Incident and alert management using enterprise tools Data analysis and dashboard development Exposure to SRE practices (reliability, automation, incident management) Familiarity with modern observability frameworks, including OpenTelemetry , is a strong advantage Responsibilities (heading) / Responsabilités (titre) A. NOC Operations & Monitoring Perform 24/7 monitoring of infrastructure and applications using tools such as ScienceLogic, Splunk, and ServiceNow Detect and respond to system anomalies through dashboards and alerts Execute first-level troubleshooting and escalate incidents based on severity Maintain operational logs, runbooks, and incident documentation B. Alert & Event Management Analyze, acknowledge, and triage alerts based on impact and service disruption Correlate events and reduce alert noise through tuning and standardization Convert alerts into incidents and manage through resolution lifecycle Ensure alert quality (actionable, accurate, and prioritized signals) C. Site Reliability Engineering (SRE) Monitor system reliability, availability, and performance Participate in incident response, RCA, and post-incident reviews Support SLI/SLO tracking and reliability improvements Drive automation and continuous improvement to reduce operational workload D. Monitoring, Observability & OpenTelemetry Implement and support end-to-end observability solutions (metrics, logs, traces) Use OpenTelemetry for standardized telemetry collection and instrumentation Improve visibility into distributed systems through trace and metric correlation Support onboarding of services into monitoring platforms with consistent observability practices Enhance signal-to-noise ratio through proper monitoring design E. Data Analysis & Reporting Extract and analyze operational data using Excel and SQL / PL-SQL Build dashboards in Tableau and ServiceNow Performance Analytics Provide insights on incident trends, uptime, and performance metrics Generate executive-level reports (MTTR, incident volume, availability) F. Dashboarding & Continuous Improvement Develop real-time dashboards integrating multiple monitoring tools Ensure data consistency across Splunk, ServiceNow, ScienceLogic, and databases Identify monitoring gaps and propose improvements Enable proactive monitoring through trend and predictive analysis Job Category: IT - Technology Services Posting End Date: 23/09/2026
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at sunlife? Share your experience