Research Intern RL & Post-Training Systems, Turbo (Fall 2026)
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
The Turbo Research team investigates how to make post-training and reinforcement learning for large language models efficient, scalable, and reliable . Our work sits at the intersection of RL algorithms , inference systems , and large-scale experimentation , where the cost and structure of inference dominate overall training efficiency and shape what learning algorithms are practical. As a research intern, you will study RL and post-training methods whose performance and scalability are tightly coupled to inference behavior , co-designing algorithms and systems rather than treating them independently. Projects aim to unlock new regimes of experimentation-larger models, longer rollouts, and more complex evaluations-by rethinking how inference, scheduling, and training interact.
Requirements
- Pursuing a PhD or MS in Computer Science, EE, or a related field (exceptional undergraduates considered)
- Have research experience in one or more of:
- RL or post-training for large models (e.g., RLHF, RLAIF, GRPO, preference optimization)
- ML systems (inference engines, runtimes, distributed systems)
- Large-scale empirical ML research or evaluation
- Are comfortable with empirical research by designing controlled experiments, while interpreting noisy results and drawing principled conclusions
- Can work across abstraction layers:
- Strong Python skills for experimentation
- Willingness to modify inference or training systems (experience with C++, CUDA, or similar is a plus)
- Example Research Directions
- Intern projects are tailored to your background and interests, and may include:
- Inference-Aware RL & Post-Training
- Designing RL or preference-optimization objectives that explicitly account for inference cost and structure (e.g., speculative decoding, partial rollouts, controllable sampling).
- Studying how inference-time approximations affect learning dynamics in GRPO-, RLHF-, RLAIF-, or DPO-style methods.
- Analyzing bias, variance, and stability trade-offs introduced by accelerated inference within RL loops.
- RL-Centric Inference Systems
- Developing inference mechanisms that support deterministic, reproducible RL rollouts at scale.
- Exploring batching, scheduling, and memory-management strategies optimized for RL and evaluation workloads rather than pure serving.
- Investigating how KV-cache policies, sampling controls, or runtime abstractions influence learning efficiency.
- Scaling Laws & Cost-Quality Trade-offs
- Empirically characterizing how reward improvement and generalization scale with rollout cost, latency, and throughput.
- Quantifying when systems-level optimizations change algorithmic behavior rather than only reducing runtime.
- Identifying regimes where inference efficiency unlocks qualitatively new learning capabilities.
- Evaluation & Measurement
- Designing rigorous benchmarks and diagnostics for post-training and RL efficiency.
- Studying failure modes in long-horizon training and how system constraints shape outcomes.
- Publications at leading ML and NLP conferences (such as NeurIPS, ICML, ICLR, ACL, or EMNLP)
- Understanding of model optimization techniques and hardware acceleration approaches
- Contributions to open-source machine learning projects
- Internship Program Details
- Our fall internship program spans over 12 to 16 weeks where you'll have the opportunity to work with industry-leading engineers building a cloud from the ground up and possibly contribute to influential open source projects. Our internship dates are September 14th to December 18th.
- About Together AI
Benefits
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Together AI? Share your experience