AI Infrastructure Engineer

External

Nio · San Jose

Full-timeHybrid1w ago

CachingDeep LearningEmbedded SystemsLLMsLoad BalancingPyTorch

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

We are looking for a senior AI Inference Infrastructure Software Engineer with strong hands-on experience building, optimizing, and deploying high-performance, scalable inference systems. This position is focused on designing, implementing, and delivering production-grade software that powers real-world applications of Large Language Models (LLMs) and Vision-Language Models (VLMs). This is an exciting opportunity for an engineer who thrives at the intersection of AI systems, hardware acceleration, and large-scale robust deployment, and who wants to see their contributions ship in production, at scale. In this role, you will directly shape the architecture, roadmap and performance of AI capabilities of our AIOS platform, driving innovations that make LLM/VLM systems fast, efficient, and scalable across cloud, edge, and hybrid edge-cloud environments. You will work closely with system, hardware, and product teams to deliver high-performance inference kernels for hardware accelerators, design scalable inference serving systems, and integrate optimizations such tensor parallelism and custom kernels into production pipelines. Your work will have immediate impact, powering intelligent automotive systems in the next generation of electric vehicles. Roles and Responsibilities: Design and implement high-performance, scalable inference systems for LLMs and VLMs across cloud, edge, and edge-cloud hybrid platforms. Develop and optimize custom kernels and operators for specific hardware accelerators (GPU, NPU, DSP, etc.), improving throughput, latency, and memory efficiency. Integrate advanced optimization techniques such as KV-cache management, tensor/model parallelism, quantization, and memory-efficient execution into production inference systems. Partner with system and hardware teams to ensure tight hardware-software integration and optimal performance across diverse compute environments. Translate architectural requirements into robust, maintainable, production-ready software that meets performance, safety, and reliability standards. Define and drive the evolution roadmap for LLM/VLM inference in the AIOS stack, ensuring scalability and adaptability to new workloads. Stay ahead of industry trends and competitor solutions, applying best practices from both AI and large-scale systems engineering.

Requirements

5+ years of hands-on software development experience in building and optimizing AI inference systems at scale.
Direct experience in LLM/VLM model internals, including Transformer-based architectures, inference bottlenecks, and optimization techniques.
Strong expertise in performance engineering: kernel development, parallelism strategies, memory optimization, and distributed inference systems.
Proficiency with GPU/NPU programming (CUDA, or vendor-specific SDKs), compiler toolchains, and deep learning frameworks (PyTorch, or TensorFlow).
Strong programming skills in C/C++, with a track record of delivering high-performance, production-grade software.
Solid foundation in computer architecture, systems programming (CPU/GPU pipelines, memory hierarchy, scheduling), and embedded systems.
BS/MS in Computer Science, Computer Engineering, or related technical field.
Excellent communication and collaboration skills, with the ability to work across cross-functional teams.
Master's or PhD degree in Computer Science, Electrical/Computer Engineering, or related fields, plus 5 years industry experience
Experience building inference serving systems for large models, including batching, scheduling, caching, and load balancing.
Expertise in hardware-aware model optimization (e.g., kernel fusion, mixed precision, quantization, pruning).
Familiarity with edge and embedded AI,

Benefits

Vision insurance

Additional Information

JOB DESCRIPTION About NIO NIO is a pioneer and a leading company in the premium smart electric vehicle market. Founded in November 2014, NIO's mission is to shape a joyful lifestyle. NIO aims to build a community starting with smart electric vehicles to share joy and grow together with users. NIO designs, develops, jointly manufactures and sells premium smart electric vehicles, driving innovations in next-generation technologies in autonomous driving, digital technologies, electric powertrains and batteries. NIO differentiates itself through its continuous technological breakthroughs and innovations, such as its industry-leading battery swapping technologies, Battery as a Service, or BaaS, as well as its proprietary autonomous driving technologies and Autonomous Driving as a Service, or ADaaS. NIO's product portfolio consists of the ES8, a six-seater smart electric flagship SUV, the ES7 (or the EL7), a mid-large five-seater smart electric SUV, the ES6, a five-seater all-round smart electric SUV, the EC7, a five-seater smart electric flagship coupe SUV, the EC6, a five-seater smart electric coupe SUV, the ET7, a smart electric flagship sedan, and the ET5, a mid-size smart electric sedan.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at nio? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect