Software Development Manager, Data Center - GenAI
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we're the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain - and we're looking for talented people who want to help. You'll join a diverse team of design engineers, quality/reliability engineers, supply chain specialists, field engineers, and other vital roles. You'll collaborate with people across AWS to help us deliver the highest standards for quality and reliability while providing seemingly infinite capacity at the lowest possible cost for our customers. And you'll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion. You'll join a team of Software Development Engineers building an agentic AI platform that serves a broad customer base of design engineers, quality/reliability engineers, supply chain specialists, field engineers, and other vital roles across AWS data center operations. You'll collaborate with people across AWS to help us deliver the highest standards for quality and reliability while providing seemingly infinite capacity at the lowest possible cost for our customers. And you'll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion. As the Software Development Manager for the Data Center Agentic AI Platform team, you will lead a team of Software Development Engineers building AWS data center's agentic GenAI platform that powers AI-assisted operations across the global data center infrastructure. You will own the technical vision and strategic roadmap for the platform, driving investments across agentic AI systems, full-stack engineering, search and knowledge systems capabilities. Your leadership will shape the direction of a next-generation AI/ML platform that orchestrates physical work processes, automates decision-making, and enhances operational efficiency for a 30K+ globally distributed user base. You will champion platform thinking building reusable primitives, APIs, and extensible components that dozens of teams across the Data Center Community build upon. In this role, you will drive the design and delivery of production-grade agentic systems including LLM orchestration, tool-calling patterns, agent frameworks, multi-agent orchestration, and intelligent workflow automation. You will partner closely with cross-functional stakeholders including data center operations, controls engineering, product management, and peer engineering teams to translate complex operational needs into scalable AI-powered solutions. You will establish and raise the bar on engineering practices including code reviews, CI/CD, progressive deployment, observability, and operational readiness for AI systems in production. You will also own hiring strategy and talent development, building a high-performing engineering team with deep expertise in generative AI, distributed systems, and full-stack development, while communicating platform strategy, technical roadmaps, and business impact to senior leadership with clarity and conviction. Key job responsibilities - Lead and mentor a team of SDEs building and operating the Data Center Agentic AI Platform, fostering a culture of ownership, innovation, and operational excellence - Own the end-to-end technical roadmap for the platform, balancing investments across agentic AI capabilities, platform infrastructure, frontend experiences, search/knowledge systems workstreams - Drive the architecture and delivery of agentic AI systems including LLM orchestration, prompt engineering, skills, harness, tool-calling patterns, semantic search, and agent frameworks leveraging technologies such as Amazon Bedrock, AgentCore, and multi-agent orchestration patterns - Lead the development of full-stack serverless solutions leveraging AWS Lambda, API Gateway, DynamoDB, EventBridge, CDK, and related services to deliver scalable, production-grade platform capabilities - Own the design of search and knowledge systems including vector embeddings, hybrid retrieval, document processing pipelines, and semantic chunking to power the platform's intelligent responses - Define and implement evaluation frameworks, guardrails, and safety mechanisms for agentic AI systems, including LLM output quality evals, agent behavior testing, content filtering, and responsible AI controls to ensure reliable and trustworthy platform behavior at scale - Build and evolve platform primitives and reusable components that enable dozens of teams across the Data Center Community to build AI-powered capabilities on top of the platform - Partner with data center operations, controls engineering, product management, and peer engineering
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Amazon.com Services LLC? Share your experience