Skip to main content
Back to jobs

Machine Learning Infrastructure Engineer- Model Inference

External
abridge logoAbridge · San Francisco
$221K–$260K/yrFull-timeRemote9mo ago
AnsibleDocumentationGenerative AIKubernetesLeadershipMachine Learning
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

As an ML Infrastructure Engineer, Model Inference at Abridge, you'll play a pivotal role in building and optimizing the core inference infrastructure that powers our machine learning models. Your work will be instrumental in enhancing the scalability, efficiency, and performance of our AI-driven solutions. You will work with our Infrastructure and Research teams to build, deploy, optimize and orchestrate across our AI models.

Responsibilities

  • Design, deploy and maintain scalable Kubernetes clusters for AI model inference and training
  • Develop, optimize, and maintain ML model serving infrastructure, ensuring high-performance and low-latency.
  • Collaborate with ML and product teams to scale backend infrastructure for AI-driven products, focusing on model deployment, throughput optimization, and compute efficiency.
  • Optimize compute-heavy workflows and enhance GPU utilization for ML workloads.
  • Build a robust model API orchestration system
  • Collaborate with leadership to define and implement strategies for scaling infrastructure as the company grows, ensuring long-term efficiency and performance.

Requirements

  • Strong experience in building and deploying machine learning models in production environments.
  • Deep understanding of container orchestration and distributed systems architecture
  • Expertise in Kubernetes administration, including custom resource definitions, operators, and cluster management
  • Experience developing APIs and managing distributed systems for both batch and real-time workloads
  • Excellent communication skills, with the ability to interface between research and product engineering
  • Ideally, You Have
  • Expertise with model serving frameworks such as NVIDIA Triton Server, VLLM, TRT-LLM and so on.
  • Expertise with ML toolchains such as PyTorch, Tensorflow or distributed training and inference libraries.
  • Familiarity with GPU cluster management and CUDA optimization
  • Knowledge of infrastructure as code (Terraform, Ansible) and GitOps practices
  • Experience with container registries, image optimization, and multi-stage builds for ML workloads
  • Experience orchestrating across ASR models or LLM models for building various GenAI applications
  • Why Work at Abridge?
  • Joining Abridge means stepping into a fast-paced, high-growth startup where your contributions truly make a difference. Our culture requires extreme ownership-every employee has the ability to (and is expected to) make an impact on our customers and our business.
  • We're committed to supporting your growth, both professionally and personally. Whether it's flexible work hours, an inclusive culture, or ongoing learning opportunities, we are here to help you thrive and do the best work of your life.
  • If you are ready to make a meaningful impact alongside passionate people who care deeply about what they do, Abridge is the place for you.
  • How we take care of Abridgers:
  • Generous Time Off : 14 paid holidays, flexible PTO for salaried employees, and accrued time off for hourly employees
  • Comprehensive Health Plans : Medical, Dental, and Vision coverage for all full-time employees and their families.
  • Generous HSA Contribution : If you choose a High Deductible Health Plan, Abridge makes monthly contributions

Benefits

Health insuranceDental insuranceVision insurancePaid time offFlexible schedule

Additional Information

About Abridge Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most-their patients. Our enterprise-grade technology transforms patient-clinician conversations into structured clinical notes in real-time, with deep EMR integrations. Powered by Linked Evidence and our purpose-built, auditable AI, we are the only company that maps AI-generated summaries to ground truth, helping providers quickly trust and verify the output. As pioneers in generative AI for healthcare, we are setting the industry standards for the responsible deployment of AI across health systems. We are a growing team of practicing MDs, AI scientists, PhDs, creatives, technologists, and engineers working together to empower people and make care make more sense. We have offices located in the Mission District in San Francisco, the SoHo neighborhood of New York, and East Liberty in Pittsburgh.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at abridge? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect