Skip to main content
Back to jobs

Senior Research Engineer - Video Foundation Models (Pre - Training)

External
synthesia logoSynthesia · Europe
Full-timeRemote3mo ago
AWSCI/CDDatadogDeep LearningDockerGANs
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

As a Research Engineer in our Video Pre-Training team, you will help build the next generation of production-grade foundation models for human-centric video generation. You will join a highly focused team working at the intersection of large-scale generative modeling, distributed systems, and production engineering. Our mission is to develop and optimize video base models that power realistic, controllable, and emotionally expressive synthetic humans at scale. This is not pure research. This is applied research with direct product impact. You will work on advancing training recipes, scaling distributed systems, improving evaluation frameworks, and optimizing inference to ensure our models are high quality, stable, and efficient enough for real-world deployment. Your work will directly influence models used by tens of thousands of businesses worldwide.

Responsibilities

  • You will own and execute end-to-end research and engineering projects, from hypothesis to production impact. This includes:
  • Developing and scaling latent video diffusion models tailored for human-centric video generation
  • Designing conditioning mechanisms to improve control (pose, emotion, script, camera) without sacrificing fidelity
  • Advancing distributed training strategies (DDP, FSDP, DeepSpeed, sequence parallelism) under real compute constraints
  • Improving training stability at multi-node scale
  • Designing rigorous evaluation frameworks combining automated metrics and structured human evaluation
  • Optimizing inference for low latency, high resolution, and cost efficiency
  • Running controlled ablations and experiments to drive high-signal modeling decisions
  • Contributing to high engineering standards: reproducibility, experiment tracking, CI/CD, monitoring
  • You will be expected to move fast, run multiple hypotheses in parallel, identify signal early, and focus on outcomes rather than exploration for its own sake.

Requirements

  • Strong experience training deep learning models at scale
  • Strong Python and PyTorch skills
  • Hands-on experience with diffusion models (image domain required; video preferred)
  • Experience with large scale multi-GPU / multi-node training
  • Good understanding of distributed training (DDP, FSDP, DeepSpeed or similar)
  • Ability to design controlled experiments and interpret noisy results
  • Experience with video diffusion models
  • Experience in avatar or human-centric generation
  • Familiarity with world / interactive models
  • Experience with GANs or VAEs
  • Experience optimizing inference systems for production
  • Our stack
  • Python, PyTorch, CUDA
  • DeepSpeed, distributed training & inference
  • Sequence parallelism
  • AWS, SLURM, Docker
  • GitHub, CI/CD pipelines
  • You are research-driven but outcome-focused
  • You care about shipping, not just publishing
  • You can explore multiple ideas quickly and drop low-signal directions early
  • You communicate clearly and present results scientifically
  • You operate independently but collaborate actively across teams
  • Why join us?
  • Build production-scale video foundation models in a fast-growing Generative AI company
  • Work on human-centric video generation with real-world impact
  • Tackle hard problems in scaling, stability, and controllability
  • Influence the direction of next-generation synthetic human technology
  • Join a highly technical, high-ownership environment where your work ships
  • If you want to work on cutting-edge generative video models and see your research power real-world products, we'd love to talk.
  • Our culture
  • Serving 50,000+ customers (and 50% of the Fortune 500)
  • We're trusted by leading brands such as Heineken, Zoom, Xerox, McDonald's and more. Read stories from happy customers and what 1,200+ people say on G2 .
  • Proprietary AI technology
  • Since 2017, we've been pioneering advan

Additional Information

Synthesia is the world's leading AI video platform for business, used by over 90% of the Fortune 100. Founded in 2017, the company is headquartered in London, with offices and teams across Europe and the US. As AI continues to shape the way we live and work, Synthesia develops products to enhance visual communication and enterprise skill development, helping people work better and stay at the center of successful organizations. Following our recent Series E funding round, where we raised $200 million, our valuation stands at $4 billion. Our total funding exceeds $530 million from premier investors including Accel, NVentures (Nvidia's VC arm), Kleiner Perkins, GV, and Evantic Capital, alongside the founders and operators of Stripe, Datadog, Miro, and Webflow.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at synthesia? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect