Skip to main content
Back to jobs

Senior Manager Software Development, AWS Systems Manager

External
Full-timeOn-site1d ago
AWS
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Lead three SDMs and their teams through a multi-year roadmap from foundations (2026) through intelligent orchestration (2028)
  • Partner with Frontier Agents, Pulsar (observability), and Binder (security) teams as the execution layer they build on top of
  • Make hard calls about safety vs. speed - how much verification is enough before letting an agent execute a production change?
  • Own operational excellence for services handling 175M+ weekly executions across all AWS regions
  • Grow engineers and managers who can independently design safety-critical distributed systems
  • Key job responsibilities
  • You'll own three engineering teams end-to-end - their roadmaps, their operational health, and their people. Specifically:
  • Set technical direction for the platform in partnership with Principal Engineers and product managers. You decide what to build, what to defer, and what to kill.
  • Hire, develop, and retain three SDMs and their engineering teams. You understand the systems deeply enough to challenge technical decisions and are actively applying generative AI in all of your daily actions.
  • Own service availability and operational excellence for services processing 175M+ automated actions per week across all AWS regions. When something breaks at 2am, your oncall teams handle it - because you built the mechanisms (runbooks, alarms, escalation paths) that make that possible.
  • Drive cross-team partnerships with AI agent teams, observability, and security services who depend on your platform as their execution layer.
  • Represent your teams in business reviews with crisp, data-driven narratives. You'll review your teams' progress monthly with multiple directors- showing roadmap progress, operational health, and customer adoption metrics.
  • Make prioritization trade-offs between new features, tech debt, security compliance, and operational burden reduction.
  • A day in the life

Additional Information

AI agents can diagnose infrastructure problems in seconds - but they still stop at "here's what you should do" because nobody trusts them to actually press the button. Your teams will build the execution layer that changes that. This is a Sr. SDM role leading three engineering teams in AWS Systems Manager's Automation suite, making it safe for AI agents and human operators to take real actions on cloud infrastructure at scale. Systems Manager Automation already runs 175M+ steps per week across 668K active accounts. The next chapter is turning it into the standard execution interface for autonomous operations - where AI agents (AWS Frontier Agents, third-party tools, customer-built agents) can safely execute runbooks with pre-execution impact analysis, blast radius scoping, and automatic rollback. Your teams will ship the capabilities that make customers say "yes, I trust this to run without me watching." What your teams will build: - Pre-execution impact analysis - LLM-powered previews that show customers exactly what a runbook will change before anything executes. Catches risky actions (restarting during peak, deleting resources with active dependencies) in real time. - Enterprise-scale document sharing - seamless runbook sharing across hundreds of accounts and organizational units, solving the top customer pain point for enterprises operating at scale - The safety framework that unlocks AI agent adoption - granular runtime permissions, approval workflows, and a break-glass execution model where agents execute through runbooks without needing direct infrastructure write permissions - Intelligent orchestration - context-aware runbook recommendations that surface the right action at the right time based on operational signals Why this is a rare opportunity: - You're building at the intersection of AI and infrastructure safety - the problem that determines whether autonomous operations stays a demo or becomes production reality - Your customers are the largest enterprises in the world, and your decisions about what's "safe enough" directly shape how the industry approaches AI-driven operations - The flywheel is real: better safety → more trust → more content → more integrations → more adoption. You're early enough to influence the shape of it. - Three teams is the right size - enough leverage to ship meaningful capabilities every quarter, small enough to know the architecture and the engineers


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Amazon Development Center U.S., Inc.? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect