Research via experimentation and theoretical modeling the network bandwidth requirements and trade-offs in Prefill-Decode (P/D) disaggregated LLM serving.
Research and implement networking techniques/methods for high-performance KV cache transfers in deployment setups without RDMA networking.
Conduct experiments to evaluate the impact of newly developed non-RDMA KV Cache transfer techniques on performance (latency and throughput) in P/D LLM serving.
Collaborate with researchers and engineers to integrate the networking techniques/methods into real-world distributed inference workflows (e.g. in llm-d)
Document findings and contribute to technical reports, research theses, blog posts, or research publications.
Requirements
Currently pursuing a Masters (with research) or Ph.D. degree in Computer Science, Electrical Engineering, Machine Learning, or a related field.
Strong programming skills in C/C++, Rust, and Python.
Experience with the Linux network stack including frameworks such as DPDK or eBPF/XDP.
Strong analytical and problem-solving skills.
Excellent communication skills and ability to work in a team-oriented research environment.
Familiarity with distributed LLM serving with prefill/decode disaggregation and KV cache transfers is a plus, but not required.
Benefits
Hands-on experience with state-of-the-art systems for ML research.Mentorship from leading experts in LLM inference and networking.Opportunity to contribute to research papers, patents, or open-source projects.Competitive stipend and potential for full-time opportunities.About Red HatInclusion at Red HatEqual Opportunity Policy (EEO)Remote work options
Additional Information
Job Summary
At Red Hat we believe the future of AI is open and we are on a mission to bring the power of open-source LLMs and distributed LLM inference to every enterprise. We are seeking a highly motivated research intern to join our Machine Learning Research Team. As a research intern, you will work on cutting-edge networking techniques for ML workloads and contribute to research and engineering efforts that make distributed LLM inference faster, efficient, and more accessible. This is an exciting opportunity to gain hands-on experience in applied networking for ML while working with leading experts in the field.