Machine Learning Engineer, Speech LLM Training - San Francisco

External

Plaud · San Francisco, CA

$200K–$540K/yrFull-timeRemote1mo ago

ComplianceGDPRHIPAAKubernetesLLMsPerformance Optimization

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Benefits

Founding Team Initiative: Opportunity to be an early, foundational member of our core SpeechLLM lab, with meaningful ownership and impact on a fast-growing startup.Competitive Compensation: $200K - $540K base salary + performance bonus + Equity.Comprehensive Benefits: Top-tier healthcare for employees and dependents, including dental and vision, and a generous employer subsidy.Retirement Planning: 401(k) plan for full-time employees with company matching.Paid Time Off: Unlimited PTO, plus 13 paid holidays.New Parent Leave: 12 weeks of paid time off to spend time with your new family, regardless of gender.Hybrid Office: Minimum of 3x in-office per week to foster highly collaborative, fast-paced research.Gear & Perks: Choice of top-of-the-line laptops/workstations, annual offsites, and a fully stocked office.Plaud is and will continue to be an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristics.Health insuranceDental insuranceVision insurance401(k)Paid time offEquity / stock optionsPerformance bonus

Additional Information

About Plaud Inc. Plaud is building the world's most trusted AI work companion for professionals to elevate productivity and performance through note-taking solutions, loved by over 1,500,000 users worldwide since 2023. With a mission to amplify human intelligence, Plaud is building the next-generation intelligence infrastructure and interfaces to capture, extract, and utilize what you say, hear, see, and think. Plaud Inc. is a Delaware-incorporated, San Francisco-based company pushing the boundary of human-AI intelligence through a hardware-software combination. With SOC 2, HIPAA, GDPR, ISO27001, ISO27701, and EN18031 compliance, Plaud is committed to the highest standards of data security and privacy protection. To learn more about Plaud, please visit https://www.Plaud.ai and follow along on Instagram , X , Facebook , LinkedIn , and YouTube Why You Should Join Us Plaud is building the next generation intelligence infrastructure and interfaces to capture, extract, and utilize intelligence from what people say, hear, see, and think. Plaud is a bootstrapped, skyrocketing, profitable company with a $250M revenue run rate achieved in just three years. Define the next-gen paradigm for human-AI interaction. Gain exposure to cutting-edge AI for Pro tools and play a direct role in our global expansion. Work with passionate teammates who value innovation, collaboration, and customer success. Grow your career in a culture that champions continuous learning and fast career development. Market-competitive compensation, global exposure, and a vibrant, creativity-fueled work atmosphere. You may be a good fit if you: Have a proven track record of building and training large-scale audio or speech models from the ground up, whether that involves unified SpeechLLMs, advanced ASR, expressive TTS, or generative audio architectures. Love living at the intersection of research and engineering, eager to design novel sequence modeling architectures one day and debug distributed training clusters the next. Are highly comfortable traversing the entire stack-from fundamental signal processing and raw acoustic representations to massive foundation model training and edge-device optimization. Possess deep expertise in PyTorch or JAX, with battle scars from optimizing large-scale distributed training runs, managing GPU memory utilization, and resolving complex performance bottlenecks. Thrive in a fast-paced, high-growth startup environment where you are expected to take extreme ownership of ambiguous problems and drive them directly into production. Are obsessed with building AI systems that natively understand and generate speech, ultimately creating a hardware-software AI companion that amplifies human productivity. Strong candidates may also have experience with: Text-based LLMs: Hands-on experience with core text-based Large Language Model pretraining, instruction tuning, or RLHF. Neural Audio Codecs: Hands-on experience designing and training state-of-the-art neural audio codecs for streamable, high-fidelity audio. Generative Architectures: Designing and training diffusion models, flow matching, or autoregressive architectures specifically for speech and voice generation. Alignment & Steerability: Applying Reinforcement Learning (RL) techniques (like RLHF or GRPO) to improve conversational cadence, steerability, and alignment in foundation models. Deep System Optimization: End-to-end inference and performance optimization, leveraging high-throughput serving frameworks (e.g., vLLM, TensorRT-LLM, SGLang) to minimize latency for real-time cloud streaming. Large-Scale Infrastructure: Managing massive GPU clusters, utilizing advanced distributed training frameworks (e.g., FSDP, DeepSpeed), and navigating orchestration tools like Kubernetes.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at plaud? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect