Senior Software Engineer (vLLM)
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
CommonAI CIC is a non-profit membership organisation, founded on a belief in collaborative engineering for the safe and responsible development of foundational AI technologies. A place where AI startups, enterprises large and small, public sector bodies and academia can share resources and knowledge, to codevelop and grow businesses, fast. We are led by experienced founders, investors and engineers who believe that collaborative engineering drives faster AI innovation and are backed by a mix of UK Government and private funding in order to design, build and deploy innovative AI systems. The Opportunity We are seeking a Senior Software Engineer with a passion for open-source AI infrastructure to work on deploying, extending and optimising vLLM (and potentially other inference serving engines) to support our projects. You will play a crucial, high-impact role across both our key programmes, the Scaling Inference Lab ( https://scalinginferece.org and the High Assurance programme. As an AI-first company, we strongly believe in collaborative engineering powered by the tools we are helping to build. This role places a major emphasis on using LLMs, AI coding assistants, and autonomous agents for software development. What You'll Do Deploy, instrument and monitor open weight models served using vLLM. Implement new features within vLLM to support novel hardware architectures as part of the Scaling Inference Lab. Work with the Panopticon team to identify opportunities to extend vLLM to enhance accuracy, explainability, and accountability when using it to serve models in regulated environments. Actively collaborate with the open-source vLLM community to propose, review, and upstream core changes. Troubleshoot, profile, and optimise inference performance, focusing on latency, throughput, and hardware utilisation. Proven experience working as a Senior Software Engineer with deep expertise in both Python and low-level programming (e.g., C/C++, Rust, assembly or CUDA). A history of direct contributions to vLLM or similar high-performance open-source ML/AI projects (e.g., PyTorch, Hugging Face TGI, TensorRT-LLM, Ray). Strong understanding of LLM inference mechanics (e.g., KV caching, continuous batching, memory management, model quantisation). Experience interacting with, and upstreaming code to, active open-source communities. Hands-on experience working on performance optimisation for hardware accelerators (GPUs, TPUs, CPU vector units or other accelerators). A strong enthusiasm for using LLMs, coding assistants, and agents as core tools in your own software development process. We also value: Experience working on software systems that operate in highly regulated or high-assurance environments (e.g., financial services). An understanding of the latest AI safety research and active involvement in that community. Deep knowledge of modern MLOps practices, CI/CD, and large-scale deployments. A collaborative and supportive work environment The opportunity to have a high impact in a growing organisation Competitive salary package and pension Professional development opportunities Networking opportunities with influential people from across the tech sector, financial services, and academia A vibrant office environment located a few minutes' walk away from Cambridge train station CommonAI CIC is an equal opportunity employer and is committed to creating an inclusive and diverse workplace.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at CommonAI C.I.C.? Share your experience