AI Researcher - Inference Optimization
ExternalFull-timeRemote4mo ago
Deep LearningDocumentationMachine LearningPythonPyTorch
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Research and develop techniques to optimize inference performance for large neural networks.
- Improve latency, throughput, memory efficiency, and cost per inference .
- Design and evaluate model-level optimizations (quantization, pruning, KV-cache optimization, architecture-aware simplifications).
- Implement systems-level optimizations (dynamic batching, kernel fusion, multi-GPU inference, prefill vs decode optimization).
- Benchmark inference workloads across hardware accelerators.
- Collaborate with engineering teams to deploy optimized inference pipelines .
- Translate research insights into production-ready improvements .
- Required Qualifications
- Strong background in machine learning, deep learning, or AI systems .
- Hands-on experience optimizing inference for large-scale models .
- Proficiency in Python and modern ML frameworks (e.g., PyTorch).
- Experience with inference tooling (e.g., Triton, TensorRT, vLLM, ONNX Runtime).
- Ability to design experiments and communicate results clearly.
- Preferred / Nice-to-Have Qualifications
- Experience deploying production inference systems at scale .
- Familiarity with distributed and multi-GPU inference .
- Experience contributing to open-source ML or inference frameworks .
- Authorship or co-authorship of peer-reviewed research papers in machine learning, systems, or related fields.
- Experience working close to hardware (CUDA, ROCm, profiling tools).
- What Success Looks Like
- Measurable gains in latency, throughput, and cost efficiency .
- Optimized inference systems running reliably in production.
- Research ideas successfully translated into deployable systems.
- Clear benchmarks and documentation that inform product decisions.
- Relevant Research Areas (Bonus)
- Long-context inference optimization
- Speculative decoding
- KV-cache compression and paging
- Efficient decoding strategies
- Hardware-aware inference design
Benefits
Performance bonus
Additional Information
Role Overview We are seeking an AI Researcher with deep experience in inference optimization to design, evaluate, and deploy high-performance inference systems for large-scale machine learning models. You will work at the intersection of model architecture, systems engineering, and hardware-aware optimization , improving latency, throughput, and cost efficiency across real-world production environments.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at featherlessai? Share your experience