Senior Team Lead - AI Inference

External

Wekatest · Remote

Full-timeRemote1mo ago

CachingKubernetesLeadershipPython

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Lead & Own: Take end-to-end ownership of AMG's core inference infrastructure - from the NVMe Token Warehouse and GDS data paths to the vLLM/LMCache serving stack - driving technical decisions and delivery outcomes.
Technical Direction: Guide a team of engineers through design, implementation, and delivery of high-throughput, low-latency LLM inference systems, setting high standards for code quality, architecture, and reliability.
Build at Scale: Stay hands-on across the AMG stack (Python, C++, CUDA, vLLM, NIXL/Dynamo, Kubernetes), contributing directly to production systems while providing technical leadership to the team.
Solve Hard Problems: Tackle the real frontier challenges of inference engineering - disaggregated prefill/decode, persistent off-HBM KV caching, RDMA-based transport, and multi-tier GPU memory hierarchies - that define what's possible at scale.
Grow People & Teams: Mentor and coach engineers through regular 1:1s, career coaching, and sprint reviews. Foster a culture of ownership, collaboration, and technical excellence within the AMG team.
Stay on the Frontier: Track the evolving inference ecosystem, benchmark new tools (SGLang, TRT-LLM, NVIDIA Dynamo), and help the team make timely decisions about when to adopt, build, or pivot.

Requirements

Experienced Engineering Leader: 5+ years of professional software engineering, with proven experience leading engineers and owning complex production systems - ideally in AI/ML infrastructure or high-performance computing.
Deep AI Inference Background: Hands-on expertise with LLM serving systems - KV cache reuse, disaggregated prefill/decode, continuous batching, and multi-tier GPU memory hierarchies (HBM → NVMe). Strong familiarity with vLLM, LMCache, NIXL/NVIDIA Dynamo, or similar frameworks.
Systems Engineering Depth: Strong Python and C++ skills (Rust a plus), with a solid grasp of CUDA, GPU memory management, and high-performance I/O - including GPUDirect Storage (GDS), RDMA, and NVMe data paths.
Infrastructure Fluency: Experience deploying and scaling GPU workloads on Kubernetes, with familiarity in RDMA networking, bare-metal GPU clusters (H100/A100), and high-throughput distributed storage.
People Leadership: Demonstrated ability to mentor and develop engineers - running effective 1:1s, supporting career growth, and balancing technical execution with long-term team health.
High Bar for Quality: A strong sense of engineering craftsmanship, with a track record of building reliable, high-throughput systems and continuously improving engineering practices.
The WEKA Way:
We are Accountable : We take full ownership, always-even when things don't go as planned. We lead with integrity, show up with responsibility & ownership, and hold ourselves and each other to the highest standards.
We are Brave : We question the status quo, push boundaries, and take smart risks when needed. We welcome challenges and embrace debates as opportunities for growth, turning courage into fuel for innovation.
We are Customer Centric : Our customers are at the heart of everything we do. We actively listen and prioritize the success of our customers, and every decision we make is driven by how we can better serve, support, and empower them to succeed. When our customers win, we win.

Benefits

Health insuranceDental insurance

Additional Information

We are seeking a Senior Team Lead to lead our AI Inference team. In this role, you will bridge the gap between complex research and production-grade engineering, while cultivating a high-performing team culture. You will lead and grow a squad of 3 developers, balancing hands-on technical contribution with strong people leadership - setting direction, unblocking your team, and driving execution on high-performance systems that optimize Large Language Model (LLM) serving. The ideal candidate combines deep technical expertise in inference and scale with the leadership maturity to mentor, motivate, and develop engineers in the evolving ecosyste m of serving frameworks like vLLM and LMCache.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at wekatest? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect