Senior Incident Specialist

External

Abinbev · Campinas, Brazil

Full-timeOn-site1d ago

CI/CDDocumentationLeadershipMicroservicesNegotiationObservability

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

AB InBev is the leading global brewer and one of the world's top 5 consumer product companies. With over 500 beer brands, we're number one or two in many of the world's top beer markets, including North America, Latin America, Europe, Asia, and Africa. About AB InBev Growth Group Created in 2022, the Growth Group unifies our business-to-business (B2B), direct-to-consumer (DTC), Sales & Distribution, and Marketing teams. By bringing together global tech and commercial functions, the Growth Group allows us to fully leverage data and drive digital transformation and organic growth for AB InBev around the world. In addition to supporting well known global beer brands like Corona, Budweiser and Michelob Ultra, the Growth Group is home to a robust suite of digital products including our B2B digital commerce platform BEES, on-demand delivery services Ze Delivery and TaDa Delivery, and table top beer keg PerfectDraft. We are an exceptional team, focused on understanding and supporting consumer and customer needs, harnessing new technology, and scaling growth opportunities. The Senior Incident Specialist is accountable for the end-to-end management of major application incidents and crises affecting business-critical systems within the Application Management Services (AMS) portfolio. This role ensures rapid service restoration, clear communication, and strong governance during high-severity incidents, while driving continuous improvement of application reliability, operability, and resilience. The position operates at the intersection of software engineering, application support, product teams, and business stakeholders, with a strong emphasis on application behavior, integrations, data flows, and code-level root causes, rather than infrastructure events.

Responsibilities

Major Incident & Crisis Management
Act as Major Incident Owner (MIO) for high-severity application incidents (Severity 1 & 2), coordinating all technical and business activities.
Lead application-focused incident bridges, ensuring correct prioritization of remediation activities across development, AMS, QA, vendors, and product teams.
Maintain full ownership from detection through service restoration and closure, including executive updates and formal incident closure.
Make risk-based decisions (rollback, feature disablement, workaround vs. permanent fix) in collaboration with application owners.
Application-Centric Incident Coordination
Drive troubleshooting for:
Application defects and regressions
Integration failures (APIs, middleware, third-party services)
Data inconsistencies, corruption, or processing failures
Performance degradation at the application or transaction level
Failed deployments, releases, or configuration changes
Ensure infrastructure findings are translated into application-level impacts and actions, keeping focus on software behavior.
Crisis Communication & Stakeholder Management
Serve as the single point of accountability during crises for business stakeholders.
Provide clear, concise, and application-impact-focused communication to:
Senior leadership / executives
Business owners and product managers
Customer-facing teams (support, service desk)
Manage customer-impact messaging in partnership with business and communications teams.
Post-Incident Review & Continuous Improvement
Facilitate blameless post-incident reviews (PIRs) focused on application design, testing gaps, monitoring, and operational readiness.
Ensure root cause analysis (RCA) clearly addresses application logic, design decisions, and process failures-not only symptoms.
Track and enforce corrective and preventive actions, including:
Code fixes and refactoring
Monitoring and alerting improvements
Deployment and release process enhancements
Runbook and support model updates
Governance, Metrics & Readiness
Define and monitor incident KPIs related to application stability (MTTR, recurrence rate, failed changes, incident volume by app).
Ensure applications meet operational readiness standards before go-live (monitoring, supportability, rollback, documentation).
Contribute to resilience engineering efforts, including chaos testing, failover simulations, and crisis drills.

Requirements

Bachelor's degree completed in related fields.
Strong expertise in Application Management Services (AMS) and production application support.
Deep understanding of, application architectures (monolith, microservices, event-driven), APIs and integrations, CI/CD pipelines and release management and application monitoring, logging, and observability tools
Excellent facilitation skills to manage multi-disciplinary technical teams during crises.
Executive-level communication skills, able to translate technical failures into business impact and recovery plans.
Strong conflict resolution and negotiation capabilities.
Experience in application support, software engineering, SRE, or technical service management.
Experie

Benefits

Paid time off

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at abinbev? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect