Design and develop robust inferencing software (TensorRT/TensorRT-LLM) optimized for functionality and performance across platforms
Perform performance analysis, optimization, and tuning of deep learning inference workloads
Track and integrate academic and industry advancements in AI and feature-update TensorRT/TensorRT-LLM accordingly
Provide feedback into architecture and hardware design and development
Collaborate across hardware, software, and research teams to shape the direction of machine learning inferencing across NVIDIA platforms
Own and deliver technical work with scope based on experience, ranging from complex features to substantial parts of larger projects, with increasing independence and technical leadership over time
Publish key technical results at leading scientific and engineering conferences
What we need to see:
Masters or higher degree in Computer Engineering, Computer Science, Applied Mathematics, or related computing-focused field (or equivalent experience)
Strong C/C++ or Python programming and software design experience, including debugging, performance profiling, and test design
2+ years working experience
Strong curiosity about artificial intelligence and familiarity with the latest developments in deep learning - including generative models, multimodal systems, and large neural networks
Experience working with deep learning frameworks such as PyTorch, TensorRT/TensorRT-LLM, NeMo, or vLLM
Proactive, self-driven, and able to work independently
Excellent written and verbal communication skills in English
Demonstrated ability, commensurate with experience, to take technical ownership, solve complex problems, and contribute effectively in cross-functional environments
Additional Information
NVIDIA is hiring software engineers for its AI Computing team. Academic and commercial groups around the world are using GPUs to power a revolution in deep learning-powered AI, enabling breakthroughs in areas like generative AI, computer vision, speech recognition, recommender systems, and large-scale language and multimodal models. Join the team building the inferencing software (TensorRT/TensorRT-LLM) that will be used across our product lines. The ability to work in a fast-paced, delivery-focused environment is required, and excellent interpersonal skills are a must.