Inference Performance Engineer
ExternalFull-timeRemote1mo ago
PythonTransformers
Prepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Serving frontier models at scale requires solving novel systems problems at every layer of the stack. As an Inference Performance Engineer, you'll own the runtime that turns accelerators into a production serving system, optimizing throughput, latency, and cost across thousands of nodes. You'll work alongside hardware and compiler teams operating at the frontier of AI silicon design.
Responsibilities
- Build and improve the inference runtime
- Design scheduling, continuous batching, KV cache, and prefill/decode disaggregation
- Implement low-precision kernels and speculative decoding
- Drive throughput, latency, and cost per token
- Collaborate with hardware teams on kernels, operators, and graph optimizations
- Own the OpenAI-compatible API surface and serving protocol
- Build benchmarking, profiling, and regression infrastructure
Requirements
- BS in CS, EE, or related field, or equivalent experience
- Software engineering experience: Rust, Go, Python, or C++
- Understanding of concurrency, memory, and tail latency
- Understanding of modern inference: transformers, attention, KV cache, batching, speculative decoding, quantization
- Experience with model serving frameworks: vLLM, TGI, SGLang, TensorRT-LLM, llama.cpp, or custom runtimes
- GPU or ASIC programming experience: CUDA, ROCm, Triton, or vendor-native toolchains
- Experience with low-precision inference (FP8, FP4, INT4)
- Profiling and benchmarking experience: Nsight, perf, custom harnesses
Benefits
Top-tier compensation structured to recognize and retain the best talentMeaningful equityComprehensive medical, dental, vision, life, and disability insuranceParental leave for all new parents, including adoptive and surrogate journeysFlexible PTOPaid HolidaysRelocation supportEqual Employment OpportunityWe're an Equal Opportunity Employer and do not discriminate on the basis of any protected status under applicable law.Dental insuranceVision insurancePaid time offFlexible scheduleEquity / stock optionsParental leave
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Material Security? Share your experience