Skip to main content
Back to jobs

Incident Response Analyst II

External
astreya logoAstreya · Singapore, Singapore
Full-timeRemoteToday
ComplianceData AnalysisDocumentationIncident ResponseRisk ManagementStakeholder Management
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Real-Time Infrastructure Monitoring
  • Perform 24x7 monitoring of critical facility systems across global data centers, including: Electrical power systems
  • Mechanical systems
  • HVAC and cooling infrastructure
  • Fire detection and suppression systems
  • Water systems and supporting infrastructure
  • Continuously monitor EPMS, BMS, DCIM, and centralized monitoring platforms.
  • Detect abnormal operating conditions and alarms.
  • Acknowledge and investigate alarms promptly.
  • Track incidents and issues through to closure.
  • Identify monitoring gaps and recommend improvements to monitoring coverage.
  • Incident Response and Coordination
  • Provide first-level incident triage and technical assessment.
  • Respond to facility alarms and operational events in real time.
  • Execute escalation procedures according to defined protocols.
  • Coordinate with internal teams, site personnel, vendors, and regional stakeholders to ensure timely issue resolution.
  • Support major incident management activities for events such as: Utility power failures
  • UPS and generator events
  • Cooling/HVAC failures
  • Fire alarm activations
  • Water leakage events
  • Security and environmental alerts
  • Maintain end-to-end ownership of incidents until resolution.
  • Ticket Management and Change Coordination
  • Create, update, and manage event tickets within established SLA targets.
  • Process work orders and monitor completion quality.
  • Track maintenance activities and change requests.
  • Support change management processes and ensure operational compliance.
  • Maintain accurate records of facility maintenance activities and change windows.
  • Compliance and Operational Governance
  • Monitor and follow up on preventive maintenance activities and routine operational changes.
  • Review technical documentation submitted by vendors and service providers, including: Method of Procedure (MOP)
  • Risk Assessment (RA)
  • Standard Operating Procedure (SOP)
  • Ensure maintenance activities comply with operational standards and freeze-period requirements.
  • Support risk management and operational audit activities.
  • Monitoring Platform and Data Administration
  • Maintain monitoring platform master data and infrastructure records.
  • Ensure the accuracy, completeness, and timeliness of asset and alarm information.
  • Support platform optimization and continuous improvement initiatives.
  • Maintain facility logs, event records, and operational documentation.
  • Reporting and Data Analysis
  • Analyze facility operational data and identify trends or recurring issues.
  • Prepare operational reports and performance summaries.
  • Provide recommendations to improve reliability and operational efficiency.
  • Maintain records required for audit, compliance, and management reporting.
  • Operational Support and Continuous Improvement
  • Participate in after-hours support and emergency escalations.
  • Provide remote support for overseas data center operations when required.
  • Support centralized cross-regional operations and collaboration.
  • Contribute to process improvements and monitoring platform enhancements.
  • Perform other duties as assigned to support business continuity and operational excellence.

Requirements

  • Associate Degree, Diploma, or higher in Engineering, Information Technology, Facilities Management, or related disciplines.
  • Minimum 2 years of experience in data center operations, facility monitoring, NOC, command center, or mission-critical environments.
  • Working knowledge of: Electrical systems
  • Mechanical systems
  • HVAC and cooling infrastructure
  • Fire detection and suppression systems
  • Building Management Systems (BMS)
  • Electrical Power Monitoring Systems (EPMS)
  • DCIM or centralized monitoring platforms
  • Experience working with incident management and escalation procedures.
  • Strong communication and coordination skills.
  • Ability to work in a 24x7 rotating shift environment.
  • Ability to manage multiple priorities in high-pressure situations.
  • Fluent in English.
  • Chinese language proficiency (reading, writing, and verbal communication) is preferred to support Chinese alarm messages, documentation, and communications.
  • Experience in: Network Operations Center (NOC)
  • Facility Operations Center (FOC)
  • Data Center Operations
  • Critical Environment Operations
  • Mission Critical Facilities
  • Experience supporting global or cross-regional operations.
  • Familiarity with structured incident, change, and problem management processes.
  • Understanding of data center capacity management (space, power, cooling).
  • Experience working with CMMS, DCIM, EPMS, BMS, or ticketing platforms.
  • Ability to perform root cause analysis and drive issue resolution.
  • Desired Competencies
  • Strong sense of ownership and urgency.
  • Excellent communication and stakeholder management skills.
  • Detail-oriented with strong documentation practices.
  • Analytical and problem-solving mindset.
  • Ability to learn quickly and adapt to changing operational environments.
  • Team-oriented with a proactive and customer-focused attitude.
  • Preferred Certifications
  • Candidates with t

Benefits

Remote work options

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at astreya? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect