Machine Learning Infrastructure Engineer

External

Asteralabs · San Jose, CA

$140K–$165K/yrFull-timeOn-siteToday

AWSCachingGCPIncident ResponseMachine LearningMLOps

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

We're hiring a Machine Learning Infrastructure Engineer to build the runtime, platform, and operational backbone for modern AI systems. This role is for someone who wants to work on the systems behind the systems: model access layers, routing, serving paths, telemetry, observability, evaluation infrastructure, and the controls needed to make fast-moving AI work reliable in practice. This is a platform role, but not in the old sense. The work is tightly coupled to how modern AI systems are actually built and used: multiple model providers, agent runtimes, skill and tool layers, inference telemetry, cost-aware routing, AI spend visibility, and governance that is strong enough for real internal adoption.

Responsibilities

Build and improve internal AI infrastructure for LLM applications, agents, retrieval systems, and model-backed engineering workflows.
Own inference deployment paths across managed and self-serve environments, including access control, monitoring, and operational reliability.
Build platform layers such as model gateways, routing, runtime integrations, telemetry, and controls for safe execution at scale.
Develop AI Ops capabilities across evaluation, release readiness, observability, incident triage, regression detection, and cost monitoring.
Build dashboards, tracing, logging, and alerting for production AI systems, including spend and usage visibility across tools and teams.
Improve performance and unit economics through routing, caching, batching, failover, and latency/cost optimization.
Create reusable APIs, SDKs, and platform abstractions that make AI systems easier to deploy, evaluate, govern, and operate.

Requirements

1-5 years of experience in software engineering, ML infrastructure, MLOps, platform engineering, or related backend/infrastructure roles.
Strong Python plus strong systems instincts.
Experience with AWS or GCP and real production service ownership.
Familiarity with inference deployments, model APIs, gateways, serving systems, or runtime infrastructure for LLM/ML workloads.
Experience with observability, telemetry, reliability engineering, and incident response.
Understanding of eval systems, release workflows, retrieval-backed systems, and debugging non-deterministic AI behavior.
Ability to translate messy platform needs into scalable internal infrastructure.
What strong candidates often look like
Why this role is interesting
The base pay compensation range for this role is between $140,000 - $165,000

Benefits

Flexible schedule

Additional Information

Astera Labs (NASDAQ: ALAB) provides rack-scale AI infrastructure through purpose-built connectivity solutions. By collaborating with hyperscalers and ecosystem partners, Astera Labs enables organizations to unlock the full potential of modern AI. Astera Labs' Intelligent Connectivity Platform integrates CXL®, Ethernet, NVLink, PCIe®, and UALink™ semiconductor-based technologies with the company's COSMOS software suite to unify diverse components into cohesive, flexible systems that deliver end-to-end scale-up, and scale-out connectivity. The company's custom connectivity solutions business complements its standards-based portfolio, enabling customers to deploy tailored architectures to meet their unique infrastructure requirements. Discover more at www.asteralabs.com . Machine Learning Infrastructure Engineer Location: San Jose, CA Experience: 1-5 years Team: Applied AI

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at asteralabs? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect