Skip to main content
Back to jobs

Member of Technical Staff - Multi-Modal, Vision

External
liquid-ai logoLiquid-ai · San Francisco
Full-timeRemote7mo ago
Computer VisionDeep LearningGitHubHugging FacePythonReinforcement Learning
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

The VLM team builds vision-language models that run on-device, under tight latency and memory constraints, without sacrificing quality. We have released four best-in-class models and we're just getting started. This team owns the full VLM pipeline end-to-end: from researching new architectures and training algorithms through data curation, evaluation, and deployment. You'll join a focused, hands-on group that works directly on models and collaborates closely with our pretraining, post-training, and infrastructure teams. Success here is measured by the capability of the models we ship. Minimal qualifications: Hands-on experience in training or evaluating VLMs with demonstrated experimental rigor. Ability to turn research ideas into scalable implementations, refine and iterate through hypotheses. Proficiency in Python and at least one deep learning framework. M.S. or Ph.D. in Computer Science, Mathematics, or a related field; or equivalent industry experience. This role is for you if you have experience in some of the following: Building or optimizing multimodal training or data pipelines. Experience with distributed training (DeepSpeed, FSDP, Megatron-LM, etc.). Multimodal post-training experience (SFT, preference optimization, RL-style methods). Dataset design and data quality expertise (quality and diversity assessment, long-tail mining). Prior open-source contributions (code, data, models) on GitHub or Hugging Face. Published research at top AI conferences (NeurIPS, ICML, CVPR, ECCV, ICLR, ACL, etc.). Experience with computer vision or visual representation learning. What working here might look like: Lead a new model capability end-to-end from task spec through data curation, training recipe, ablations, evaluation, and into the final shipped model. Improve visual reasoning through reinforcement learning and preference optimization methods. Push the quality-efficiency frontier on token efficiency via encoder/connector design. Exemplary outcome: a connector that cuts vision tokens without quality loss. What Success Looks Like (Year One): The VLM models we ship are state-of-the-art. You own a major work-stream (for instance, video understanding, preference data quality, or encoder architecture) end-to-end. At least one model has shipped to production with your direct contribution.

Benefits

Full ownership: You own your work from architecture to deployment.Compensation: Competitive base salary with equity in a unicorn-stage companyHealth: We pay 100% of medical, dental, and vision premiums for employees and dependentsFinancial: 401(k) matching up to 4% of base payTime Off: Unlimited PTO plus company-wide Refill Days throughout the yearHealth insuranceDental insuranceVision insurance401(k)Paid time offEquity / stock options

Additional Information

About Liquid AI Spun out of MIT CSAIL, we build general-purpose AI systems that run efficiently across deployment targets, from data center accelerators to on-device hardware, ensuring low latency, minimal memory usage, privacy, and reliability. We partner with enterprises across consumer electronics, automotive, life sciences, and financial services. We are scaling rapidly and need exceptional people to help us get there.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at liquid-ai? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect