Skip to main content
Back to jobs

GPU Systems Engineer

External
wehrtyou logoWehrtyou · London, UK
Full-timeOn-site1mo ago
AnsibleLinuxPython
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Design, build, and optimize large-scale distributed GPU compute clusters
  • Identify and resolve GPU workloads' performance bottlenecks across compute, storage, and networking layers
  • Collaborate with research and development teams to profile, benchmark, and fine-tune GPU-based workloads
  • Automate system deployment, monitoring, and troubleshooting across thousands of nodes
  • Collaborate with research, and engineering teams to support evolving workloads
  • Own critical infrastructure projects - from concept to implementation and support
  • Test and deploy new hardware and software, and partner with vendors to resolve complex issues

Requirements

  • 5+ years of experience in large-scale Linux systems engineering in HPC, AI or distributed infrastructure roles.
  • Extensive experience in Linux system installation, performance tuning, and troubleshooting
  • Expertise in troubleshooting distributed GPU workloads
  • Deep knowledge around GPU optimization and performance
  • Proficiency in Python scripting and automation frameworks
  • CUDA or C/C++ experience is a plus
  • Experience with NVIDIA technologies beyond CUDA, such as NCCL, GPUDirect RDMA, and NVLink
  • Familiarity with configuration management tools (e.g. Salt, Ansible, Puppet, Chef)
  • Comfortable diagnosing complex system issues at the hardware, OS, and network levels
  • Strong communication and organizational skills; able to collaborate across diverse technical teams
  • Thrive in fast-paced environments and excited by high-impact work
  • Culture
  • Hudson River Trading (HRT) brings a scientific approach to trading financial products. We have built one of the world's most sophisticated computing environments for research and development. Our researchers are at the forefront of innovation in the world of algorithmic trading.

Benefits

Performance bonus

Additional Information

Hudson River Trading (HRT) is looking for GPU Systems Engineers to help scale and evolve our exceptionally sophisticated HPC/AI research environment. Joining our Research and Development team, you will collaborate with experts responsible for the compute, storage, operating systems, and automation tools that enable our trading and research to run 24/7 across the globe. We design, grow, and operate infrastructure at a large scale, including triple-digit petabyte-scale storage and massive CPU and GPU clusters in globally distributed data centers. As such, this is a high-impact role with broad scope, from HPC/AI cluster design and performance tuning, to troubleshooting and automation for thousands of nodes.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at wehrtyou? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect