Research Intern - Video World Models (Research & ML Systems)
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
We are seeking an exceptional Research Intern to join our core team in building the next generation of interactive Video World Models. While traditional generative AI focuses on generating passive pixels (e.g., text-to-video), our mission is fundamentally more ambitious: we are building foundational "World Models" that inherently understand physics, causality, action spaces, and complex dynamics directly from internet-scale data. Our goal is to train models that can simulate and "dream" complex virtual worlds, allowing users and agents to explore and interact with them in real time. This is not a purely theoretical role. Training interactive world models at this scale requires pushing the absolute limits of modern GPUs. We operate at the intersection of cutting-edge generative AI research and high-performance machine learning systems. We are looking for "full-stack" hacker-researchers-visionary thinkers who are also elite engineers, capable of co-designing novel neural architectures and engineering the highly optimized infrastructure required to train them across thousands of GPUs. What You Will Do Architect & Scale Foundation Models: Design, train, and scale state-of-the-art interactive world models (combining Diffusion, Autoregressive Transformers , VAEs, LLMs, VLMs) on massive video datasets. Push the Boundaries of ML Systems: Architect highly scalable distributed training pipelines, utilizing advanced model and data parallelism to train massive models efficiently on large-scale GPU clusters. Optimize for Efficiency: Profile and optimize model architectures to break through memory and compute bottlenecks. Write high-performance, custom hardware kernels to maximize Model FLOPs Utilization (MFU) and enable real-time, low-latency inference. Who We Look For