Member of Technical Staff, Exceptional Generalist (Remote)

External

Inferact · Worldwide

Full-timeRemote5mo ago

KubernetesPythonPyTorch

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

This is a globally remote opportunity. We're seeking exceptional generalist engineers who can work across the entire vLLM stack: from low-level GPU kernels to high-level distributed systems. This role is designed for self-directed, autonomous individuals who can identify the highest-leverage problems and solve them end-to-end without constant guidance. You'll work asynchronously with our San Francisco headquarters while maintaining full ownership of critical infrastructure. You might be optimizing CUDA kernels one week, designing distributed orchestration systems the next, and implementing new model architectures the week after. The work you do will directly impact how the world runs AI inference. Potential focus areas include: Inference Runtime: Push the boundaries of LLM and diffusion model serving. Work at the core of vLLM to optimize how models execute across diverse hardware and architectures. Kernel Engineering: Write the low-level kernels and optimizations that make vLLM the fastest inference engine in the world, running on hundreds of accelerator types. Performance & Scale: Build the distributed systems that power inference at global scale-design foundational layers enabling vLLM to serve models across thousands of accelerators with minimal latency. Cloud Orchestration: Build the operational backbone for cluster management, deployment automation, and production monitoring that enables teams worldwide to serve AI models without friction.

Requirements

Core Requirements:
Bachelor's degree or equivalent experience in computer science, engineering, or similar
Demonstrated ability to work autonomously and drive projects to completion without close supervision
Excellent asynchronous communication skills and ability to collaborate effectively across time zones
Strong track record of shipping high-impact work in complex technical environments
Deep expertise in at least one of: systems programming, GPU/accelerator programming, distributed systems, or ML infrastructure
Technical Depth (strong in at least two):
CUDA kernels or equivalent (Triton, TileLang, Pallas) with deep understanding of GPU architecture
High-performance distributed systems in Rust, Go, or C++
Python with PyTorch internals and LLM inference systems (vLLM, TensorRT-LLM, SGLang)
Kubernetes, container orchestration, and infrastructure-as-code at scale
Transformer architectures, KV-cache memory management, and model serving
Contributions to vLLM or other major open-source ML/systems projects
Experience with multiple accelerator platforms (NVIDIA, AMD, TPU, Intel)
Knowledge of quantization techniques, ML-specific kernel optimization, or compiler technologies
Track record of improving system reliability and performance at scale
Written widely-shared technical blogs or impactful side projects in the ML infrastructure space
Logistics
Location: Fully remote, worldwide. We're timezone-flexible but expect regular overlap with Pacific Time for critical syncs.
Compensation: We offer competitive compensations (salary + equity) compared to the local market conditions.
Visa sponsorship: We sponsor visas on a case-by-case basis.
Benefits: Inferact offers competitive benefits appropriate to your location, including health coverage where applicable.

Benefits

Health insuranceVision insuranceRemote work optionsFlexible scheduleEquity / stock options

Additional Information

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of models and hardware-a position that took years to build.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at inferact? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect