Incident Manager
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- End-to-End Problem Management (Sev1-Sev5)
- Own production issues from detection through full resolution
- Quickly assess impact and assign severity (Sev1-Sev5)
- Lead triage, investigation, and resolution efforts
- Maintain clear ownership throughout the lifecycle, regardless of which teams are involved
- Drive fast, effective restoration of service
- Resolve More, Closer to the Team
- Directly investigate and resolve issues whenever possible
- Partner closely with operations and reliability teams to resolve issues without unnecessary escalation
- Reduce dependency on engineering teams for repeat or well-understood problems
- Build reusable knowledge and patterns to improve team self-sufficiency
- Root Cause Analysis & Prevention
- Perform and/or lead root cause analysis (RCA)
- Identify recurring patterns and systemic weaknesses
- Drive fixes that prevent entire classes of issues from recurring
- Ensure issues are fully resolved-not just temporarily mitigated
- Incident Leadership & Communication
- Lead real-time response for high-impact production issues
- Coordinate cross-functional teams with clarity and urgency
- Communicate clearly with stakeholders, including leadership, during active incidents
- Provide structured updates on impact, progress, and next steps
- Process, Tooling & Continuous Improvement
- Improve incident management processes, workflows, and operating models
- Build and maintain runbooks and response procedures
- Identify opportunities for automation and better monitoring
- Ensure high-quality documentation and knowledge sharing
- What You Bring
- Required Experience & Skills
- 5+ years of experience in incident management, site reliability engineering (SRE), production operations, or similar roles
- Proven ability to lead and resolve production issues under pressure
- Strong technical breadth across systems, applications, and infrastructure
- Ability to diagnose and troubleshoot issues directly, not just coordinate response
- Excellent communication skills-clear, concise, and composed under pressure
- Strong sense of ownership and accountability
- Analytical mindset with strong problem-solving skills
Requirements
- Experience in high-availability, large-scale production environments
- Familiarity with tools such as ServiceNow, Jira Service Management, or PagerDuty
- Experience with cloud platforms (AWS, Azure, or GCP)
- Familiarity with monitoring and observability tools
- Knowledge of ITIL frameworks (helpful, but not required)
- How We Measure Success
- Success in this role is defined by outcomes:
- Faster time to restore service (MTTR)
- More issues resolved directly within the incident management / operations function
- Reduction in high-severity issues (Sev1 / Sev2)
- Fewer recurring issues due to strong root cause resolution
- Improved system reliability and stakeholder confidence
- What Makes This Role Different
- You are not a ticket router-you are a problem solver
- You don't just respond to incidents-you prevent them from happening again
- You work across the stack-not within a silo
- Your work directly improves both system reliability and engineering productivity
- Compensation: The Base Salary range for this position is between $80,000-$110,000.
- Candidates must be legally authorized to
Additional Information
Job Description: AssetMark is a leading strategic provider of innovative investment and consulting solutions serving independent financial advisors. We provide investment, relationship, and practice management solutions that advisors use in helping clients achieve wealth, independence, and purpose. The Job/What You'll Do: We are looking for an experienced Incident Manager to own the end-to-end lifecycle of production issues across our technology platforms and services. This role goes beyond traditional incident coordination. Incident Managers are hands-on operators responsible for driving rapid service restoration, resolving issues directly whenever possible, and eliminating recurring problems at their source. You will work across the full technology stack-partnering with engineering, infrastructure, and operations teams-to ensure reliable system performance and a high-quality user experience. This is a high-visibility role that requires strong technical judgment, clear communication under pressure, and a bias toward action. You will play a critical role in improving system reliability while helping teams spend less time firefighting and more time building. This role participates in a 24/7 operating model, including on-call responsibilities. We can only consider candidates for this position who are able to accommodate a hybrid work schedule and are close to our Charlotte, NC office.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at assetmark? Share your experience