Director, HPC Infrastructure Engineering

External

Guardant Health · Palo Alto, CA

Full-timeRemote2w ago

AccessibilityAWSAzureLeadershipLinuxMentoring

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Benefits

Health insuranceRemote work options

Additional Information

Company Description Guardant Health is a leading precision oncology company focused on guarding wellness and giving every person more time free from cancer. Founded in 2012, Guardant® is transforming patient care and accelerating new cancer therapies by providing critical insights into what drives disease through its advanced blood and tissue tests, real-world data and AI analytics. Guardant tests help improve outcomes across all stages of care, including screening to find cancer early, monitoring for recurrence in early-stage cancer, and treatment selection for patients with advanced cancer. For more information, visit guardanthealth.com and follow the company on LinkedIn , X (Twitter) and Facebook . Guardant Health's High-Performance Computing team (HPC) builds and operates the computational technology infrastructure backbone of the company. This includes scalable data storage that holds petabytes of genomics data, high performance compute clusters running a custom bioinformatics pipeline in production and R&D environments, and the software infrastructure that hosts an ecosystem of services for internal data processing and external data integration. To facilitate Guardant Health's fast growth in the next few years, the HPC team is seeking a strong technical engineering leader who can help maintain and grow the HPC infrastructure during its expansion, while partnering with other engineering functions (Corporate IT, SQA and DevOps/SRE ) as well as the R&D user community and Lab Operations. This is a hands-on technical leadership position that will leverage your expertise in HPC environments, as well as your experience leading and managing a team. Role: Director, HPC Infrastructure Engineering Location: Preference is given to candidates located in the San Francisco Bay Area with the ability to work onsite in Redwood City and Palo Alto; however, the role offers partial remote flexibility. Onsite presence is required during rotational coverage, scheduled maintenance windows, and cluster deployment activities . In this role, you will primarily lead an engineering team to: Oversee and manage the HPC environment - compute, storage, network, physical infrastructure, and software - serving multiple Production and Development clusters Integrate HPC systems with on-prem and cloud-based systems and data sources as required Administer multiple HPC clusters and associated cluster file systems Research, design, and implement next-generation HPC solutions Diagnose and resolve production system stack issues, leveraging software utilities down to the source code level (e.g., shell scripts, Python, etc.) Maintain and monitor infrastructure and facilities to ensure operational stability Drive continuous improvement initiatives to enhance reliability and performance as workloads and data volumes scale Ensure control, integrity, and accessibility across systems and applications serving multiple concurrent users Provide operational oversight for systems at remote and international locations Collaborate with offsite consultants to sustain and optimize infrastructure performance Partner with vendors to procure, troubleshoot, upgrade, repair, and replace systems as needed Foster a culture of continuous engineering improvement through design and architecture review, mentoring, feedback, and development and monitoring of key performance metrics Hire, coach, and mentor individuals; build a strong cross-functional organization Partner with a diverse customer base to understand requirements, priorities, and processes Propose and implement new projects or recommend system improvements Observe Quality standards appropriate for an FDA governed and CLIA/CAP compliant diagnostic laboratory Manage budgets to balance refresh of obsolete equipment and software, scaling to support company growth, utilizing fixed headcount and contractor/consulting resources Participate in a 24/7 on-call rotation Required: B.S. in Computer Science or related technical field or equivalent experience 10 years' experience with high performance computing platforms, preferably organizations handling large volumes of sequenced genomic data, within a commercial enterprise Experience with software-defined Infrastructure and cloud computing - Google Cloud Platform, Amazon Web Service (AWS) etc GPUs and Petabyte scale Storage platforms management experience Design, deployment, support and troubleshooting experience, in a complex computing environment HPC Engineering team management experience (either directly or in a matrixed environment) 4+ years of networking experience with certification of CCNA or better 4+ years of Linux/Unix system administration, knowledge of Unix network protocols, TCP/IP, core infrastructure technologies and virtualization 2+ years of large-scale data storage and compute clusters (HPC) infrastructure 2+ years working in and with on-premise and cloud-based (AWS, Google, IBM and Azure) data-centers 2+ years of building software release and o

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Guardant Health? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect