Skip to main content
Back to jobs

HPC Engineer, Metal Net

External
coreweaveu logoCoreweaveu · London, UK
Full-timeOn-site2w ago
AnsibleBashComplianceDocumentationGDPRGrafana
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Deploy, operate, and support NVLink/NVSwitch platforms across large data center environments.
  • Troubleshoot Linux, networking, hardware, firmware, performance, and stability issues in production.
  • Build automation and improve runbooks, dashboards, alerts, and lifecycle workflows.
  • Collaborate with teams across CoreWeave, external vendors, and customer-facing stakeholders.
  • Drive assigned work to completion with clear communication, thoughtful prioritization, and early visibility into risks or blockers.
  • Participate in on-call, incident response, root cause analysis, and follow-up improvements.
  • Contribute to reliable workflows that scale across regions, platforms, and fleet growth, with ownership calibrated by level.

Requirements

  • Strong Linux system administration and troubleshooting skills.
  • Networking fundamentals and common troubleshooting tools.
  • Production debugging experience using logs, metrics, and command-line tools.
  • Server, network, GPU, or data center hardware troubleshooting experience.
  • Practical scripting or automation experience in Python, Go, Bash, or similar.
  • Clear communication, documentation, collaboration, and on-call readiness.
  • Curiosity to learn specialized GPU interconnect technologies such as NVLink, NVSwitch, and InfiniBand.
  • Ansible or other infrastructure automation tooling.
  • Kubernetes application development or operations experience.
  • Grafana, Prometheus, PromQL, or similar observability systems.
  • Large fleet operations across Linux systems, network devices, GPUs, or infrastructure components.
  • InfiniBand, RDMA, HPC networking, or low-latency/high-bandwidth fabrics.
  • BMC, Redfish, IPMI, firmware lifecycle management, or hardware management APIs.
  • NVLink, NVSwitch, NVIDIA GPU platforms, NVUE, SONiC, or network operating systems.
  • To fulfill our obligation to protect client data, successful applicants offered employment with CoreWeave will be required to complete a basic criminal record check, conducted in compliance with GDPR. Employment offers are conditional upon receiving satisfactory check results

Benefits

In addition to a competitive salary, we offer a variety of benefits to support your needs, including:Family-level Medical InsuranceFamily-level Dental InsuranceGenerous Pension ContributionLife Assurance at 4x SalaryCritical Illness CoverEmployee Assistance ProgrammeTuition ReimbursementWork culture focused on innovative disruptionBenefits may vary by location.Equal OpportunityRecruitment AgenciesCoreWeave does not accept speculative CVs. Any unsolicited CVs received will be treated as the property of CoreWeave and your Terms & Conditions associated with the use of CVs will be considered null and void.Any unsolicited CVs sent by your company to us - that is to say, in any situation where we have not directly engaged your company in writing to supply candidates for a specific vacancy - will be considered by us to be a "free giftDental insuranceEquity / stock optionsPerformance bonus

Additional Information

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com . We're proud to be a Living Wage accredited Employer. CoreWeave is building and operating some of the largest GPU infrastructure in the world. The Metal Net team owns the high-bandwidth GPU interconnect platforms that make large-scale AI and HPC workloads possible, including NVLink and NVSwitch-based systems. We are looking for an HPC Engineer to deploy, operate, troubleshoot, and improve these platforms across our global data center footprint. This role is a strong fit for engineers who enjoy production troubleshooting, hardware-adjacent systems work, automation, observability, and learning specialized infrastructure deeply. Prior NVLink experience is helpful, but not required.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at coreweaveu? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect