Skip to main content
Back to jobs

Sr. Quality & Reliability Engineer, Hardware Engineering Services

External
Amazon Data Services, Inc. logoAmazon Data · Seattle, WA
Full-timeOn-site2d ago
AWS
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

This team is responsible for the end-to-end lifecycle of mechanical components that enable high-performance, large-scale data center systems. Engineers work as primary owners of their components, partnering closely with suppliers and manufacturing teams to ensure solutions perform reliably from initial development through deployment in the field. Our mission is to deliver highly reliable, scalable solutions that support increasingly demanding compute workloads. This includes solving challenges related to performance and long-term operation in real-world data center environments. Engineers develop deep expertise in their component domains, enabling more effective design reviews, targeted validation strategies, and rapid resolution of issues observed during development and in production systems.

Requirements

  • Experience in developing functional specifications, design verification plans and functional test procedures
  • 5+ years of root cause analysis and process design experience
  • Knowledge of process improvement and quality measures
  • Experience working across teams and influencing teams that are not your own
  • Experience that includes strong analytical skills, attention to detail, and effective communication abilities
  • Bachelor's degree in Reliability Engineering, Physics, Material Science, Mechanical engineering or related field
  • 5+ years of Quality, Reliability, and/or mechanical design engineering work (experience with server platforms or data center operations preferred)
  • Strong background in reliability engineering methods (DFMEA, ALT, statistical analysis) at the sub-assembly or component level
  • Experience working with interdisciplinary teams to execute product design from concept to production
  • Experience with the project management of technical projects
  • Master's Degree or PhD in Reliability Engineering, Quality Engineering, or related field
  • Demonstrated ability to uncover systemic issues prior to NPI
  • Experience with liquid cooling technologies (e.g., cold plates, quick disconnects, fluid systems) strongly preferred
  • Knowledge of fluid system failure modes (leakage, corrosion, clogging, material compatibility)
  • Analytical, test plan, and test procedure development experience related to server

Additional Information

AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we're the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain - and we're looking for talented people who want to help. We are seeking a Senior Component Quality & Reliability Engineer to own the end-to-end quality and reliability of liquid cooling components used in next-generation server systems. This role will focus on ensuring that liquid cooling solutions meet performance, durability, and field reliability targets across both new product introduction (NPI) and sustaining phases. You will serve as the primary owner for liquid cooling component quality and reliability, including cold plates, pumps, manifolds, hoses, quick disconnects, and CDU interfaces. You will drive qualification strategy, monitor supplier quality, and provide field performance insights. This is a highly cross-functional role requiring strong technical judgment, structured problem-solving, and the ability to influence design and manufacturing decisions. Key job responsibilities - Define and execute reliability validation strategies for liquid cooling components and subsystems - Develop test plans (e.g., HALT, stress testing, lifecycle validation) aligned to real-world use conditions - Lead DFMEA and Design for Reliability (DFR) activities to identify and mitigate risks early - Partner with system teams to ensure proper integration testing and margin validation - Drive root cause analysis for component qualification and fleet failures - Drive corrective and preventive actions with internal teams, manufacturing partners and component suppliers - Translate failure mechanisms into actionable design, material, and process improvements Lead Supplier Quality - Define critical-to-quality requirements and process controls for suppliers - Conduct supplier audits and support qualification readiness - Partner with suppliers to improve process capability, reliability performance, and defect detection - Analyze fleet performance data (e.g., failure trends, AFR, ARR) for liquid cooling components - Apply statistical methods (e.g., Weibull analysis) to predict risk and inform actions - Drive systemic issue identification and resolution across platforms Influence Cross-Functional Decisions - Communicate risks clearly - Provide data-driven input to design and architecture tradeoffs


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Amazon Data Services, Inc.? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect