Skip to main content
Back to jobs

Research Scientist - Privacy-Preserving Large-Scale Model Training & Architecture Optimization

External
TikTok logoTiktok · San Jose, CA
Full-timeOn-site2mo ago30+ days old, may be filled
ComplianceDeep LearningPerformance OptimizationPyTorch
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

Privacy Innovation (PI) Lab is established to explore the next frontier of privacy technology and theory in the digitalized world. We provide key insights and technical solutions on privacy-related innovation for all TikTok's products. Furthermore, we also collaborate with worldwide technical and academic communities to build an open ecosystem to promote a privacy-friendly digital experience. We are building next-generation generative foundation models, with a strong focus on diffusion-based and unified generation-understanding architectures, deployed in privacy-sensitive, production environments. This role sits at the intersection of - Large-scale model training systems - GPU-first architecture and kernel-level optimization - Diffusion / DiT / unified multimodal foundation models - Privacy-preserving and compliant training pipelines You will work on end-to-end training architecture design, from model-parallel execution and GPU efficiency to robust, fault-tolerant, privacy-aware training infrastructure.

Responsibilities

  • At TikTok, we treat privacy as our top priority in our product design and implementation. Privacy is not just about regulation compliance, but also about a more trusted way to enable technology innovation by respecting users' privacy choices!
  • Model Training Architecture & Systems
  • Design and optimize large-scale training architectures for diffusion-based and unified generative models (e.g., DiT, Rectified Flow, hybrid AR + diffusion systems).
  • Lead GPU-centric performance optimization, including memory layout, communication overlap, kernel fusion, and throughput scaling across thousands of accelerators.
  • Develop and evolve distributed training strategies (DP / TP / PP / ZeRO / FSDP-style sharding) tailored to long-running, multi-stage foundation model training.
  • Robustness, Reliability & Production Readiness
  • Build fault-tolerant, self-healing training systems that can sustain long-running jobs under frequent hardware, network, and software failures.
  • Design mechanisms for fast failure detection, recovery, and minimal training interruption, including checkpointing strategies, restart policies, and controlled rollouts.
  • Improve training ETTR / MFU / utilization efficiency under real-world production constraints.
  • Diffusion & Unified Model Optimization
  • Optimize Diffusion Transformer training pipelines, including noise schedules, timestep strategies, and memory-efficient attention mechanisms.
  • Support unified generation-and-understanding models, enabling shared context, long-sequence multimodal reasoning, and scalable training without architectural bottlenecks.
  • Collaborate with research teams on architecture-level tradeoffs between quality, compute efficiency, and training stability.

Requirements

  • Strong background in large-scale deep learning systems and distributed training.
  • Hands-on experience with GPU optimization, including memory management, communication/computation overlap, and performance profiling.
  • Experience training diffusion models, DiT-style architectures, or large foundation models at scale.
  • Proficiency in PyTorch and modern distributed training stacks.
  • Solid understanding of parallelism strategies (DP / TP / PP / ZeRO / FSDP or equivalents).
  • Ability to reason about training stability, numerical issues, and long-running job robustness.
  • Experience with privacy-preserving ML, sensitive data training, or regulated environments.
  • Familiarity with fault-tolerant training systems, checkpointing strategies, or production GPU orchestration.
  • Experience with unified multimodal models (generation + understanding) or hybrid AR/diffusion systems.
  • Low-level performance work (CUDA kernels, custom ops, fused attention, or communication libraries).
  • Background in production ML infrastructure supporting thousands of GPUs.
  • Job Information
  • [For Pay Transparency] Compensation Description (annually)
  • The base salary range for this position in the selected city is $156000 - $316800 annually.
  • The Company res

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at TikTok? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect