Sr. Quality & Reliability Engineer, Hardware Engineering Services
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
This team is responsible for the end-to-end lifecycle of mechanical components that enable high-performance, large-scale data center systems. Engineers work as primary owners of their components, partnering closely with suppliers and manufacturing teams to ensure solutions perform reliably from initial development through deployment in the field. Our mission is to deliver highly reliable, scalable solutions that support increasingly demanding compute workloads. This includes solving challenges related to performance and long-term operation in real-world data center environments. Engineers develop deep expertise in their component domains, enabling more effective design reviews, targeted validation strategies, and rapid resolution of issues observed during development and in production systems.
Requirements
- Experience in developing functional specifications, design verification plans and functional test procedures
- 5+ years of root cause analysis and process design experience
- Knowledge of process improvement and quality measures
- Experience working across teams and influencing teams that are not your own
- Experience that includes strong analytical skills, attention to detail, and effective communication abilities
- Bachelor's degree in Reliability Engineering, Physics, Material Science, Mechanical engineering or related field
- 5+ years of Quality, Reliability, and/or mechanical design engineering work (experience with server platforms or data center operations preferred)
- Strong background in reliability engineering methods (DFMEA, ALT, statistical analysis) at the sub-assembly or component level
- Experience working with interdisciplinary teams to execute product design from concept to production
- Experience with the project management of technical projects
- Master's Degree or PhD in Reliability Engineering, Quality Engineering, or related field
- Demonstrated ability to uncover systemic issues prior to NPI
- Experience with liquid cooling technologies (e.g., cold plates, quick disconnects, fluid systems) strongly preferred
- Knowledge of fluid system failure modes (leakage, corrosion, clogging, material compatibility)
- Analytical, test plan, and test procedure development experience related to server
Additional Information
AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we're the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain - and we're looking for talented people who want to help. We are seeking a Senior Component Quality & Reliability Engineer to own the end-to-end quality and reliability of liquid cooling components used in next-generation server systems. This role will focus on ensuring that liquid cooling solutions meet performance, durability, and field reliability targets across both new product introduction (NPI) and sustaining phases. You will serve as the primary owner for liquid cooling component quality and reliability, including cold plates, pumps, manifolds, hoses, quick disconnects, and CDU interfaces. You will drive qualification strategy, monitor supplier quality, and provide field performance insights. This is a highly cross-functional role requiring strong technical judgment, structured problem-solving, and the ability to influence design and manufacturing decisions. Key job responsibilities - Define and execute reliability validation strategies for liquid cooling components and subsystems - Develop test plans (e.g., HALT, stress testing, lifecycle validation) aligned to real-world use conditions - Lead DFMEA and Design for Reliability (DFR) activities to identify and mitigate risks early - Partner with system teams to ensure proper integration testing and margin validation - Drive root cause analysis for component qualification and fleet failures - Drive corrective and preventive actions with internal teams, manufacturing partners and component suppliers - Translate failure mechanisms into actionable design, material, and process improvements Lead Supplier Quality - Define critical-to-quality requirements and process controls for suppliers - Conduct supplier audits and support qualification readiness - Partner with suppliers to improve process capability, reliability performance, and defect detection - Analyze fleet performance data (e.g., failure trends, AFR, ARR) for liquid cooling components - Apply statistical methods (e.g., Weibull analysis) to predict risk and inform actions - Drive systemic issue identification and resolution across platforms Influence Cross-Functional Decisions - Communicate risks clearly - Provide data-driven input to design and architecture tradeoffs
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Amazon Data Services, Inc.? Share your experience