The MLA Post-Silicon Validation team owns validation of AWS's next-generation ML training accelerators from first silicon through production deployment in AWS data centers. We sit at the intersection of hardware, firmware, and ML software - ensuring every layer of the stack performs, scales, and meets the quality bar. Our team culture values deep technical ownership, data-driven decisions, and a bias for action. We operate with startup agility backed by AWS-scale resources, and our work directly enables the cloud computing infrastructure that millions of customers rely on for AI/ML workloads.
Requirements
3+ years of non-internship professional software development experience
2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience with RF measurement equipment, including: power meters, spectrum analyzers, vector signal generators, network analyzers, oscilloscopes
Bachelor's degree in Computer Science, Computer Engineering, or Electrical Engineering, or experience in test setup automation using MATLAB, Python, or Pearl
3+ years of hands-on experience validating or characterizing at least one high-speed interface technology (HBM, PCIe, DDR, SerDes, or UCIe)
Proficiency with signal integrity measurement techniques: eye diagram analysis, jitter decomposition, and voltage/timing margin assessment
Familiarity with at least one interface specification standard (JEDEC HBM/DDR, PCI-SIG PCIe, or UCIe consortium)
3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Experience with HBM PHY training sequences and characterization
Knowledge of forward error correction (FEC) and link reliability metrics (BER, MTBF)
PCB and package-level signal integrity awareness (S-parameters, channel modeling)
Experience with multi-die or chiplet architectures and die-to-die interconnect validation
Familiarity with equalization techniques (CTLE, DFE, FFE) and adaptive tuning
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
Additional Information
Annapurna Labs, an AWS organization with development centers in the U.S. and Israel, builds custom silicon and software for AWS customers. Our team combines cloud-scale innovation with world-class expertise across silicon engineering, hardware design, verification, software, and operations to tackle technical challenges that have never been seen before.
Join our Post-Silicon Validation team to validate the critical high-speed interconnects - HBM, PCIe, UCIe, and chip-to-chip links - that enable massive bandwidth and scaling in AWS's next-generation ML accelerators. You'll ensure link integrity, performance, and reliability at the electrical and protocol layers across the full product lifecycle, from first silicon through production deployment in AWS data centers.
You'll work in a fast-paced, startup-like environment alongside some of the brightest minds in the industry on next generation, internet-scale technology that directly impacts how customers use Machine Learning acceleration. Your expertise will be instrumental in validating the I/O technologies that enable our custom silicon to scale beyond what was previously possible.
Key job responsibilities
Validate high-speed interfaces (HBM, PCIe, UCIe, custom SerDes) end-to-end from PHY training through sustained traffic
Perform electrical characterization: eye diagrams, jitter analysis, voltage margin, and equalization tuning
Execute protocol-level compliance testing and interoperability validation against industry specifications
Stress-test links across PVT (Process, Voltage, Temperature) corners and aging conditions
Debug link training failures, bit errors, and performance degradation using lab instrumentation and silicon debug features
Collaborate with PHY design teams on equalization optimization, margin improvement, and silicon feedback
A day in the life
Your primary focus is making sure the high-speed links on our AI chips actually work - reliably, at speed, and across all conditions. You'll spend time in the lab: scoping eye diagrams, tracking down why a link dropped during a thermal sweep, or tuning equalization to squeeze out extra margin. Beyond the hands-on debug, you'll also develop compliance test suites and build the characterization data that drives production readiness decisions. When things get weird, you're the one who figures out why.