Member of Technical Staff - Data & ML Infra Engineer

External

Moonlake · San Francisco, CA

Full-timeOn-site8mo ago

Design SystemsGrafanaKubernetesObservabilityOpenTelemetryPerformance Optimization

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

Moonlake is building the frontier of interactive world models: systems that generate, simulate, and reason over 3D environments for embodied AI, robotics and gaming. We develop the simulation infrastructure to build worlds (e.g., assets, scenes, digital twins) at scale. Our team sits at the intersection of: Embodied AI Robotics simulation Interactive 3D worlds World models Real-time generation AI infrastructure Moonlake is building the next generation of AI infrastructure for interactive digital worlds. Our mission is to enable anyone to create, simulate, and interact with rich environments using natural language and multimodal inputs, turning simple ideas into worlds with structure, logic, and agents that can perceive and act. Our team has raised $28M in seed funding from NVIDIA Ventures, Threshold Ventures, AIX ventures and notable angels including Naval Ravikant and Jeff Dean to build the foundational layer for the future of AI - powering everything from creative tools and games to robotics training, simulations, and digital twins. Our goal is to make building and experimenting with these environments as accessible and scalable as publishing video on the internet. We are looking for exceptional research engineers and applied researchers to help push the frontier of interactive AI. We're looking for a Member of Technical Staff - Data & ML Infrastructure Engineer to help build and optimize the systems that power Moonlake's model training and inference infrastructure. This role sits at the core of Moonlake's platform and focuses on one mission: Improve throughput, latency, and cost - deploying models 2-10× faster and cheaper without quality regressions. You'll work across GPU kernels, inference systems, distributed training, serving infrastructure, observability, and large-scale orchestration systems. This is a highly technical systems role intended for engineers who enjoy operating at the intersection of: ML systems Distributed infrastructure GPU optimization Production AI deployment Performance engineering This role emerged directly from Moonlake's need to better support large-scale world-model training and deployment infrastructure.

Responsibilities

Optimize large-scale model training and inference systems
Improve GPU utilization, latency, throughput, and deployment efficiency
Build infrastructure that supports real-time world-model and multimodal workloads
Develop and optimize serving pipelines for frontier AI systems
Work closely with research teams to productionize high-performance models
Build scalable orchestration and observability systems for distributed AI infrastructure
Improve reliability, rollout safety, autoscaling, and production monitoring
Design systems that support fast experimentation without sacrificing stability
Scope of Work
GPU Performance Optimization
CUDA / Triton kernels
FlashAttention family
Paged attention
CUDA Graphs
Memory optimization
Kernel-level performance tuning
Model Serving & Inference
TensorRT-LLM
Triton Inference Server
vLLM / TGI
Continuous batching
On-GPU KV cache reuse
Speculative decoding / Medusa
Mixture-of-agents routing
Distributed Training & Parallelism
FSDP / ZeRO
Tensor parallelism
Pipeline parallelism
Expert parallelism
NCCL tuning
Multi-node GPU orchestration
Quantization & Efficient Fine-Tuning
AWQ / GPTQ / FP8
LoRA / DoRA serving
Efficient deployment pipelines
Infrastructure & Systems
Ray
Kubernetes
Argo
Autoscaling systems
Canary deployments & rollback infrastructure
A/B experimentation systems
Observability stack:
Prometheus
Grafana
OpenTelemetry
Why This Role Matters
Moonlake's products require real-time, highly efficient AI infrastructure capable of powering interactive worlds and embodied intelligence systems at scale.
The difference between:
200ms and 2s latency
40% and 90% GPU utilization
Stable rollout and catastrophic regression
...directly impacts the company's ability to train, deploy, and scale world-model systems.
You'll help define the infrastructure foundation behind the next generation of interactive AI systems.
We are committed to being an on-site, in-person team currently based in San Francisco.

Additional Information

Introducing Moonlake, AI for creating world simulations.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Moonlake? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect