Incident Response Analyst II
ExternalFull-timeRemoteToday
ComplianceData AnalysisDocumentationIncident ResponseRisk ManagementStakeholder Management
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Real-Time Infrastructure Monitoring
- Perform 24x7 monitoring of critical facility systems across global data centers, including: Electrical power systems
- Mechanical systems
- HVAC and cooling infrastructure
- Fire detection and suppression systems
- Water systems and supporting infrastructure
- Continuously monitor EPMS, BMS, DCIM, and centralized monitoring platforms.
- Detect abnormal operating conditions and alarms.
- Acknowledge and investigate alarms promptly.
- Track incidents and issues through to closure.
- Identify monitoring gaps and recommend improvements to monitoring coverage.
- Incident Response and Coordination
- Provide first-level incident triage and technical assessment.
- Respond to facility alarms and operational events in real time.
- Execute escalation procedures according to defined protocols.
- Coordinate with internal teams, site personnel, vendors, and regional stakeholders to ensure timely issue resolution.
- Support major incident management activities for events such as: Utility power failures
- UPS and generator events
- Cooling/HVAC failures
- Fire alarm activations
- Water leakage events
- Security and environmental alerts
- Maintain end-to-end ownership of incidents until resolution.
- Ticket Management and Change Coordination
- Create, update, and manage event tickets within established SLA targets.
- Process work orders and monitor completion quality.
- Track maintenance activities and change requests.
- Support change management processes and ensure operational compliance.
- Maintain accurate records of facility maintenance activities and change windows.
- Compliance and Operational Governance
- Monitor and follow up on preventive maintenance activities and routine operational changes.
- Review technical documentation submitted by vendors and service providers, including: Method of Procedure (MOP)
- Risk Assessment (RA)
- Standard Operating Procedure (SOP)
- Ensure maintenance activities comply with operational standards and freeze-period requirements.
- Support risk management and operational audit activities.
- Monitoring Platform and Data Administration
- Maintain monitoring platform master data and infrastructure records.
- Ensure the accuracy, completeness, and timeliness of asset and alarm information.
- Support platform optimization and continuous improvement initiatives.
- Maintain facility logs, event records, and operational documentation.
- Reporting and Data Analysis
- Analyze facility operational data and identify trends or recurring issues.
- Prepare operational reports and performance summaries.
- Provide recommendations to improve reliability and operational efficiency.
- Maintain records required for audit, compliance, and management reporting.
- Operational Support and Continuous Improvement
- Participate in after-hours support and emergency escalations.
- Provide remote support for overseas data center operations when required.
- Support centralized cross-regional operations and collaboration.
- Contribute to process improvements and monitoring platform enhancements.
- Perform other duties as assigned to support business continuity and operational excellence.
Requirements
- Associate Degree, Diploma, or higher in Engineering, Information Technology, Facilities Management, or related disciplines.
- Minimum 2 years of experience in data center operations, facility monitoring, NOC, command center, or mission-critical environments.
- Working knowledge of: Electrical systems
- Mechanical systems
- HVAC and cooling infrastructure
- Fire detection and suppression systems
- Building Management Systems (BMS)
- Electrical Power Monitoring Systems (EPMS)
- DCIM or centralized monitoring platforms
- Experience working with incident management and escalation procedures.
- Strong communication and coordination skills.
- Ability to work in a 24x7 rotating shift environment.
- Ability to manage multiple priorities in high-pressure situations.
- Fluent in English.
- Chinese language proficiency (reading, writing, and verbal communication) is preferred to support Chinese alarm messages, documentation, and communications.
- Experience in: Network Operations Center (NOC)
- Facility Operations Center (FOC)
- Data Center Operations
- Critical Environment Operations
- Mission Critical Facilities
- Experience supporting global or cross-regional operations.
- Familiarity with structured incident, change, and problem management processes.
- Understanding of data center capacity management (space, power, cooling).
- Experience working with CMMS, DCIM, EPMS, BMS, or ticketing platforms.
- Ability to perform root cause analysis and drive issue resolution.
- Desired Competencies
- Strong sense of ownership and urgency.
- Excellent communication and stakeholder management skills.
- Detail-oriented with strong documentation practices.
- Analytical and problem-solving mindset.
- Ability to learn quickly and adapt to changing operational environments.
- Team-oriented with a proactive and customer-focused attitude.
- Preferred Certifications
- Candidates with t
Benefits
Remote work options
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at astreya? Share your experience