Skip to main content
Back to jobs

Incident Response Analyst II

External
astreya logoAstreya · San Jose, CA
Full-timeOn-siteToday
AWSAzureCDNDocumentationGCPGDPR
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Requirements

  • 3+ years of experience in a command center, NOC/FOC, or 24x7 operations environment.
  • Proven ability to triage multiple concurrent incidents, with strong prioritization based on severity and risk.
  • Familiarity with data center layouts, IP networking, servers, and LAN/WAN configurations.
  • Experience with facility and environmental monitoring relevant to incident response. Proficiency with IT systems, high keyboard accuracy (minimum 25 WPM), and comfort using ticketing tools and monitoring platforms. -
  • Strong communication skills-able to work independently, provide clear updates, and collaborate across global teams.
  • Understanding of data protection regulations (e.g., GDPR) and how to manage sensitive information securely.
  • Willingness to work on-site, in rotating shifts (including nights, weekends, and holidays) as part of a global support model.
  • Demonstrated ability to consistently meet or exceed SLA timelines for incident management and resolution.
  • Strong analytical and problem-solving skills-can perform under pressure and resolve incidents efficiently.
  • Exposure to project coordination or process improvement initiatives.
  • Relevant certificates in either Cloud, Server or Edge related work
  • Ability to work weekday or weekend work with possible shift rotation.
  • Salary Range
  • Astreya offers

Benefits

Performance bonus

Additional Information

We are seeking a dynamic and proactive Incident Response Analyst (IRA) to join our global operations team. This role is critical to maintaining operational integrity across data centers, cloud services, and regional facilities. The ideal candidate will be experienced in realtime monitoring, incident response, and collaborative coordination, with the ability to manage both IT infrastructure incidents and facility/environmental alerts. The IRA must consistently meet strict Service Level Agreement (SLA) timelines to ensure rapid detection, response, and resolution of incidents. Key Responsibilities Monitoring & Detection - Continuously monitor for alerts and alarms across: - IT infrastructure: Server performance issues, network outages, system failures. - Environmental alerts relevant to operations - Response to Cloud based alerts/alarms. - External Facing Services: Incoming emails, including colocation maintenance notices, service requests from CDN partners, and other critical notifications - Responsible for initial assessment, containment, and mitigation of cloud infrastructure alerts and alarms - Proven experience managing live infrastructure incidents across OCI/AWS/Azure/GCP within a 24/7 Operations Center. - Serve as the first responder to all alerts and notifications-perform prompt triage, categorize severity, and initiate appropriate response actions in alignment with defined SLA timeframes. - Create and manage alarm, incident and change tickets, ensuring documentation quality and strict adherence to SLA timelines. Incident Response & Coordination -Serve as the Incident Commander during active incidents, leading incident bridge calls and orchestrating response efforts in collaboration with internal teams, subject matter experts (SMEs), external vendors, and the Global Operations Center (GOC). - Facilitate global shift handovers, ensuring seamless communication and issue tracking between regions. - Collaborate closely with data center operators, network engineers, security personnel, and other stakeholders. - Ensure all incident response activities comply with strict SLA timelines for acknowledgment, escalation, and resolution. Documentation & Reporting - Produce accurate and timely incident reports, detailing: - Executive summary and timeline - Root cause (actual or potential) - Business impact and remediation steps - Draft Post-Incident Reports (PIRs) and assist in scheduling internal post-mortem reviews. - Maintain up-to-date standard operating procedures (SOPs), runbooks, and incident handling documentation. - Ensure documentation and reporting tasks are completed within SLA requirements. Operational Support & Process Improvement - Support regional managers and program owners in maintaining operational excellence and enhancing processes


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at astreya? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect