Skip to main content
Back to jobs

HPC Performance Engineer

External
CoreWeave logoCoreweave · New York, NY
Full-timeOn-site2w ago
BashDockerDocumentationGrafanaKubernetesLinux
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

Develop and maintain tools for establishing systems performance baselines Develop and maintain performance regression analysis testing automation Design and maintain performance regression test pipelines for HPC workloads Debug and Tune fabric-level performance to ensure low-latency high throughput configurations Development of telemetry for performance analysis across distributed clusters of servers Triage and fix performance issues in Linux Collect data, produce metrics and visualizations that communicate performance information compared to benchmarks; this data should lead to appropriate business decisions and toward greater automation that improves customer experience in relation to performance Define Linux and OS requirements, specifications, and system architecture in relation to systems performance, in collaboration with cross-functional teams. Along with these responsibilities there will also be cross team collaboration to triage and resolve bottlenecks

Responsibilities

  • Kernel H ardware - A cceleration - V irtualization - O perating Systems - C ontainerization - K ubelet
  • Our Team's Stack:
  • Python, Go, bash/sh, C
  • Prometheus, Victoria Metrics, Grafana
  • Linux Kernel (custom build), Ubuntu
  • Intel/AMD/ARM CPUs, Nvidia GPUs, DPUs, Infiniband and Ethernet NICs
  • Docker, kubernetes (k8s), KubeVirt, containerd, kubelet

Requirements

  • 5+ years of professional experience in Systems/HPC Performance Engineering, Benchmarking, and/or Validation.
  • Bachelor's degree in Computer Engineering, Electrical Engineering, Computer Science, or a related field.
  • Strong experience with MPI workloads and distributed system performance analysis
  • Familiarity with RoCE, InfiniBand, and GPUDirect/Data Direct I/O, NUMA, etc in HPC workloads
  • Hands-on use of public HPC benchmarks (HPCC, HPL, OSU, MLPerf-HPC, STREAM, IO500)
  • Extensive, deep experience in Linux internals
  • Fluency with a programming language geared toward automation (Python preferred, but others possible)
  • Experience writing robust, testable code
  • Experience diagnosing and fixing systems performance issues
  • Experiencing with implementing automation testing
  • Ability to effectively prioritize and communicate proposed features and fixes in a remote-employee environment
  • Strong passion for automation, with a commitment to automating processes comprehensively
  • Excellent documentation skills and attention to detail
  • Strong analytical and problem-solving abilities
  • Preferred:
  • Familiarity with QA/QE best practices
  • Familiarity with Golang
  • Opinions about software version control and team collaboration
  • Experience working in Cloud environments
  • Experience as a software engineer writing large-scale applications
  • Experience in open-source community software development
  • Experience with machine learning is a huge bonus
  • Wondering if you're a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams - even if you aren't a 100% skill or experience match.
  • Why CoreWeave?
  • Be Curious at Your Core
  • Act Like an Owner
  • Empower Employees
  • Deliver Best-in-Class Client Experiences
  • Achieve More Together
  • We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and

Benefits

Remote work optionsPerformance bonus

Additional Information

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com .


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at CoreWeave? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect