Skip to main content
Back to jobs

High Performance Computing Hardware / Systems Engineer

External
KLA logoKla · Ann Arbor, MI
Full-timeRemote5d ago
AssemblyDocumentationLinuxRoutingSystem Design
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Requirements

  • Experience with rack‑scale system design in production or lab environments.
  • Familiarity with HPC networking (100G/200G/400G Ethernet, InfiniBand).
  • Experience with GPU‑accelerated HPC systems.
  • Exposure to HPC software stacks, workload characteristics, and performance tuning.
  • Experience in semiconductor, advanced manufacturing, or inspection systems is a plus.
  • Key Skills
  • HPC hardware architecture
  • Rack‑level system integration
  • IPMI/BMC management
  • Linux system administration
  • Hardware bring‑up and debugging
  • Networking and storage integration
  • Cross‑functional collaboration (hardware, software, algorithms, manufacturing)
  • Doctorate (Academic) Degree and 0 years related work experience; Master's Level Degree and related work experience of 3 years; Bachelor's Level Degree and related work experience of 5 years
  • Base Pay Range: $105,900.00 - $180,000.00 Annually
  • Primary Location: USA-MI-Ann Arbor-KLA
  • I

Benefits

Dental insuranceVision insurance401(k)Paid time offRemote work optionsFlexible scheduleEquity / stock options

Additional Information

Company Overview KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays. The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and we invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the world's leading technology providers to accelerate the delivery of tomorrow's electronic devices. Life here is exciting and our teams thrive on tackling really hard problems. There is never a dull moment with us. Job Description/Preferred Qualifications HPC Hardware & Rack Engineering Design, assemble, and integrate rack‑level HPC systems, including compute, storage, networking, power distribution, and cooling. Perform hands‑on server bring‑up, including component installation, cable routing, labeling, and rack documentation. Evaluate and select HPC components (CPUs, GPUs, memory, NICs, storage, PCIe cards) based on performance, reliability, and cost. Partner with vendors and internal teams on BOM definition, hardware qualification, and lifecycle management (NPI, sustaining, and EOL transitions). System Bring‑Up & Management Configure and manage systems using IPMI/BMC tools for remote access, monitoring, firmware updates, and diagnostics. Configure, validate, and document BIOS, firmware, and BMC settings. Debug hardware issues across power, thermals, PCIe, memory, storage, and interconnects. HPC Software & OS Integration Install, configure, and maintain Linux OS environments (e.g., SUSE, Rocky) on HPC systems. Tune OS settings for HPC workloads (CPU pinning, memory, I/O, networking). Collaborate with software teams to ensure hardware platforms meet application and algorithm requirements. Networking, Storage & Interconnects Work with high‑speed networking technologies (Ethernet, InfiniBand, RoCE). Integrate and validate high‑bandwidth NICs, switches, and cabling at rack scale. Support local and shared storage solutions (NVMe, DAS, RAID, and NFS/SMB where applicable). Reliability, Validation & Sustaining Participate in system validation, stress testing, and reliability characterization. Perform root‑cause analysis for field and lab issues involving hardware or HW/SW interactions. Develop and maintain system documentation, rack layouts, and bring‑up procedures. Required Qualifications Bachelor's or master's degree in Electrical Engineering, Computer Engineering, Computer Science, or equivalent practical experience. 5+ years of experience with HPC hardware systems or large‑scale compute platforms. Strong hands‑on experience with: Server assembly and rack integration CPUs, GPUs, memory, PCIe, NICs, and storage devices IPMI/BMC and out‑of‑band management Linux system administration in HPC or server environments Solid understanding of system‑level characteristics (performance, thermals, power, reliability). Ability to debug issues spanning hardware, firmware, OS, and system configuration.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at KLA? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect