Skip to main content
Back to jobs

LLM Training Engineer

External
Sciforium logoSciforium · San Francisco
$155K–$220K/yrFull-timeOn-site2mo ago
CachingDeep LearningETLLLMsMachine LearningObservability
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

As an LLM Training Engineer , you'll work across the full foundation-model stack: pretraining and scaling , post-training and Reinforcement Learning , sandbox environments for evaluation and agentic learning , and deployment + inference optimization . You'll build and iterate quickly on research ideas, contribute production-grade infrastructure, and help deliver models that can serve real-world use cases at scale.

Responsibilities

  • This role spans multiple tracks - candidates may focus on one or contribute across several. Examples include:
  • Pretraining & Scaling
  • Train large byte-native foundation models across massive, heterogeneous corpora
  • Design stable training recipes and scaling laws for novel architectures
  • Improve throughput, memory efficiency, and utilization on large GPU clusters
  • Build and maintain distributed training infrastructure and fault-tolerant pipelines
  • Post-training & RL
  • Develop post-training pipelines (SFT, preference optimization, RLHF/RLAIF, RL)
  • Curate and generate targeted datasets to improve specific model capabilities
  • Build reward models and evaluation frameworks to drive iterative improvement
  • Explore inference-time learning and compute techniques to enhance performance
  • Sandbox Environments & Evaluation
  • Build scalable sandbox environments for agent evaluation and learning
  • Create realistic, high-signal automated evals for reasoning, tool use, and safety
  • Design offline + online environments that support RL-style training at scale
  • Instrument environments for observability, reproducibility, and iteration speed
  • Deployment & Inference Optimization
  • Optimize inference throughput/latency for byte-native architectures
  • Build high-performance serving pipelines (KV caching, batching, quantization, etc.)
  • Improve end-to-end model efficiency, cost, and reliability in production
  • Profile and optimize GPU kernels, runtime bottlenecks, and memory behavior
  • Ideal candidate credentials
  • Technical strength
  • Strong general software engineering skills (writing robust, performant systems)
  • Experience with training or serving large neural networks (LLMs or similar)
  • Solid grasp of deep learning fundamentals and modern literature
  • Comfort working in high-performance environments (GPU, distributed systems, etc.)
  • Relevant experience (one or more)
  • Pretraining / large-scale distributed training (FSDP/ZeRO/Megatron-style systems)
  • Post-training pipelines (SFT, RLHF/RLAIF, preference optimization, eval loops)
  • Building RL environments, simulators, or agent frameworks
  • Inference optimization, model compression, quantization, kernel-level profiling
  • Building large ETL pipelines for internet-scale data ingestion and cleaning
  • Owning end-to-end production ML systems with monitoring and reliability
  • Research orientation
  • Ability to propose and evaluate research ideas quickly
  • Strong experimental hygiene: ablations, metrics, reproducibility, analysis
  • Bias toward building - you can turn ideas into working code and results
  • Education
  • MS or PhD in Computer Science, Machine Learning, AI, Mathematics, or related field
  • Benefits include
  • Medical, dental, and vision insurance
  • 401k plan
  • Daily lunch, snacks, and beverages
  • Flexible time off
  • Competitive salary and equity
  • Equal opportunity
  • Sciforium is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.

Benefits

Dental insuranceVision insurance401(k)Flexible scheduleEquity / stock options

Additional Information

Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed by multi-million-dollar funding and direct sponsorship from AMD with hands-on support from AMD engineers the team is scaling rapidly to build the full stack powering frontier AI models and real-time applications.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Sciforium? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect