Skip to main content
Back to jobs

Member of Technical Staff, Infrastructure Engineer

External
odysseyml logoOdysseyml · Palo Alto
Full-timeOn-site3mo ago
Core DataDeep LearningDockerKubernetesPythonRobotics
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

Odyssey is an AI lab pioneering general-purpose world models: causal, multimodal systems that learn to predict and interact with the world over long horizons, while generating real-time, interactive simulations from any starting point. This foundational technology promises to revolutionize robotics, science, healthcare, education, gaming, defense, and beyond.

Responsibilities

  • Develop and operate our low-latency model inference platform, ensuring high availability, scalability, and efficient resource utilization for Odyssey's world models.
  • Engineer and scale our core data processing infrastructure (e.g., Flyte, Ray with k8s) to handle petabyte-scale datasets.
  • Design, build, and maintain our large-scale, GPU-based training clusters for deep learning, focusing on usability, high throughput and reliability.
  • Automate infrastructure provisioning, configuration, monitoring, and alerting using Infrastructure as Code (IaC) principles.
  • Drive performance tuning, cost optimization, and reliability improvements across the entire stack.
  • Collaborate closely with researchers and product developers to understand their requirements, optimize their workflows, and improve platform usability.

Requirements

  • Motivated by building for the frontier: you want to shape the compute and infrastructure foundation of a lab redefining how people create and interact with media.
  • Strong programming skills (e.g., Python, Go, or similar) and a solid understanding of software engineering best practices.
  • Deep, hands-on experience with containerization (e.g., Docker), container orchestration (Kubernetes) and Infrastructure as Code (Terraform).
  • Proven experience building and managing large-scale, distributed systems with GPU computational workloads (e.g., compute platforms, data pipelines, or high-availability services).
  • Experienced in designing infrastructure for ML workloads where performance, parallelism, and data movement are critical.
  • A collaborative mindset and excellent communication skills, with a passion for building developer-friendly platforms.

Benefits

Health insuranceVision insurance

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at odysseyml? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect