Software Engineer, Inference
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
We are seeking Inference Engineers to accelerate the performance of Pika's AI-driven products. In this highly technical role, you will operate at the intersection of cutting-edge inference acceleration, GPU parallelism, advanced model deployment, and video generation technologies. Your expertise will drive significant improvements to model speed and efficiency, ensuring our creative AI systems deliver industry-leading user experiences at scale. You will design and optimize inference pipelines, implement state-of-the-art acceleration techniques, and work closely with researchers and engineers across the team to push the boundaries of what's possible in real-time AI deployment. Your efforts will play a foundational role in powering the next generation of Pika's video and language models.
Responsibilities
- Accelerate Inference : Lead and implement advanced inference acceleration techniques, including attention optimization and quantization for efficient model serving.
- Maximize GPU Parallelism : Engineer and optimize GPU strategies across tensor, sequence, and pipeline parallelism (TP, SP, PP) for maximal efficiency and scalability.
- Programming for Performance : Develop and optimize high-performance computing kernels and distributed workloads using CUDA and NCCL.
- Advance AI Deployment : Collaborate with research and engineering teams to bring state-of-the-art videogen and large language models into production.
- Improve Training Efficiency : (Bonus) Contribute to improvements in model training speed, stability, and resource utilization as part of our deployment lifecycle.
- Technical Excellence : Drive rigorous code reviews, participate in technical discussions, and mentor fellow engineers on best practices in inference and GPU programming.
Requirements
- Experience : 3+ years engineering experience, with a strong track record in inference acceleration and model deployment at scale.
- Inference Mastery : Proven expertise in inference optimization, including quantization, attention acceleration, and deep learning compiler stacks.
- GPU & Parallelism : Deep knowledge of GPU programming (CUDA, NCCL) and experience with SP, TP, PP, and other forms of parallelism for distributed inference.
- AI Domain Knowledge : Familiarity with video generation (videogen) models and large language models (LLMs).
- Collaboration : Strong cross-discipline communication skills; able to drive shared goals across research and engineering functions.
- Ownership Mindset : Self-driven, solutions-oriented, and capable of managing ambiguity in a fast-paced startup environment.
- Bonus : Experience in enhancing training efficiency, stability, or resource optimization for large models.
- Experience with high-throughput video or real-time streaming model deployment
- Familiarity with distributed training and optimization toolkits
- Contributions to open source projects in AI infrastructure or deep learning compilers
- Startup or rapid prototyping experience
Benefits
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at pika? Share your experience