Skip to main content
Back to jobs

Sr. Principal Software Engineer

External
cerence logoCerence · Remote
Full-timeRemoteToday
Caching
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Optimize and deploy high ‑ performance LLM inference pipelines
  • Own inference runtimes across data center, edge, and embedded platforms
  • Push model performance through quantization, kernel fusion, and cache optimization
  • Drive latency and throughput improvements that directly impact production products
  • Enable efficient, reliable deployment without external vendor dependency
  • Core Responsibilities
  • Inference Engines & Runtime
  • Build deep expertise and ownership of:
  • vLLM
  • TensorRT‑LLM
  • llama.cpp
  • QAIRT
  • Extend and tune inference engines using custom CUDA kernels
  • Adapt runtimes for constrained and embedded deployment environments
  • Quantization & Numerical Optimi s ation
  • Implement and evaluate quanti s ation strategies:
  • INT8, INT4, FP4, FP8, mixed precision
  • AWQ
  • GPTQ
  • Balance accuracy, latency, memory footprint, and throughput
  • KV Cache Optimization
  • Optimize key-value cache performance through:
  • Paging
  • Prefix caching
  • Cache ‑ aware memory layout design
  • Reduce memory pressure while sustaining high throughput
  • Latency & Throughput Optimi s ation
  • Design and tune:
  • Batching strategies
  • Continuous batching
  • Speculative decoding
  • Optimize tail latency and tokens/sec under real production traffic patterns
  • What Success Looks Like
  • Models deploy efficiently on edge and embedded devices , not just servers
  • Tokens/sec significantly outperform baseline implementations
  • End ‑ to ‑ end latency is minimized and predictable
  • Inference cost per request is materially reduced
  • The company is no longer dependent on partners for inference optimization
  • Required Experience & Skills
  • Strongly Required
  • Proven experience optimizing ML inference performance in production
  • Deep understanding of GPU architecture and memory hierarchies
  • Hands ‑ on experience with CUDA and low ‑ level performance tuning
  • Experience deploying models beyond research environments
  • Critical Technical Skills
  • Inference engines: vLLM , TensorRT ‑ LLM, llama.cpp, QAIRT
  • CUDA kernel development and profiling
  • Quanti s ation techniques: INT8/INT4/FP4/FP8, AWQ, GPTQ
  • KV cache optimi s ation and memory layout design
  • Latency optimi s ation : batching, speculative decoding, continuous batching
  • Common Problems You'll Be Solving
  • Deploy efficiently on edge or embedded targets
  • Achieve competitive tokens/sec
  • Reduce and stabilize inference latency
  • You will be responsible for closing these gaps , creating a major competitive advantage.

Benefits

We offer a generous compensation and benefits package (in addition to the base salary), including:Salary range $141,400 USD - $226,300 USD It is not typical for offers to be made at or near the top of the range. The actual salary will be determined based on experience and other job-related factors.Annual bonus opportunityInsurance coverage (medical, dental, vision, life, and disability)Paid time offPaid holidaysCompany contribution to the RRSP (Registered Retirement Savings Plan)Equity awards for certain positions and levelsRemote and/or hybrid work available depending on the positionAll compensation and benefits are subject to the terms and conditions of the underlying plans or programs, as applicable, and may be amended, terminated, or replaced from time to time.Dental insuranceVision insuranceRemote work optionsEquity / stock optionsPerformance bonus

Additional Information

A Moving Experience. Who is Cerence AI? Cerence AI is the global leader in AI for transportation, specialized in building AI and voice-powered companions for cars, two-wheelers, and more that enable people to focus on what matters most. With over 500 million cars shipped with Cerence AI's technology, we partner with leading automakers (such as Volkswagen, Mercedes, Audi, Toyota and many more), mobility providers, and technology companies to power intuitive, integrated experiences that create safer, more connected, and more enjoyable journeys for drivers and passengers alike. Our Driving Force Our team is dedicated to pushing the boundaries of AI innovation, working around the globe with headquarters in Burlington, Massachusetts, USA and 16 other offices across Europe, Asia, and North America. We bring together diverse backgrounds, and varied skill sets with the shared goal of advancing the next generation of transportation user experiences. Our culture is customer-centric, collaborative, fast-paced, and fun, with continuous opportunities for learning and development to support your career growth. Interested in having a significant impact in a dynamic industry with a high-performing global team? We're looking for an exceptional Senior Principal Software Engineer who is ready to drive the future of mobility with us! Job Description:


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at cerence? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect