Member of Technical Staff, Vision / Language

External

Xdof · San Mateo Hybrid

Full-timeRemote2w ago

Machine LearningPyTorchRobotics

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Design and implement vision-language pipelines for egocentric and teleoperation video: structured captioning, temporal grounding, action-conditioned scene understanding, and semantic annotation at scale
Develop and evaluate representations that bridge visual perception, language, and low-level robot action - spanning VLAs, video prediction, and world models
Build and improve data curation systems that assess quality, diversity, and coverage of large-scale robot demonstration datasets
Work hands-on with bimanual and high-DoF manipulation data, including real teleoperation footage and sim-generated rollouts
Collaborate directly with partner labs to define data requirements and close the loop between data quality and downstream policy performance
Stay current on the research frontier (VLAs, video foundation models, flow matching, DiT architectures, egocentric pretraining) and translate insights into production systems
Required:
MS or PhD in Computer Science, Robotics, Machine Learning, or a related field from a top-tier program
3-7 years of research or applied research experience (industry or academic) in one or more of: vision-language models, video understanding, robot learning, or generative modeling
Deep fluency in PyTorch; working knowledge of large-scale training infrastructure (distributed training, mixed precision, large batch workflows)
Published work or demonstrable impact in VLMs/VLAs, video representation learning, imitation learning, or a closely related area
Strong engineering fundamentals - you can design clean systems, not just run experiments

Benefits

Competitive compensation and equityComprehensive health and wellness benefitsFlexible work arrangementsCollaborative and fast-paced work environmentOpportunity to shape the future of robotics and AI alongside an ambitious, values-driven teamLevel: Mid Level to Senior Research Scientist (L4-L5 equivalent) Location: San MateoNote: Junior candidates will still be consideredIf you're excited to help build the infrastructure powering tomorrow's intelligent machines, we'd love to hear from you!Health insuranceVision insuranceFlexible scheduleEquity / stock options

Additional Information

About XDOF Frontier labs are racing to build general-purpose robots, and the bottleneck isn't compute. It's data. At XDOF, we're building the foundation behind the foundation models: the data collection systems, annotation pipelines, exabyte-scale data infrastructure, and software toolchain that enable our partners to push the field forward. We're hiring a Research Engineer / Scientist to help lead technical efforts at the intersection of vision-language models and robot learning. You will build systems that turn raw egocentric and teleoperation video into high-signal training data for VLA models, and increasingly, contribute to the models themselves. Beyond pipelines, you will drive research into what makes robot data useful : discovering new metadata (contact events, affordance labels, implicit reward signals, dynamics priors from video) that unlock capabilities current approaches miss. You'll explore how structured annotations can improve cross-embodiment transfer, automatic curriculum generation, and world models that predict what actually matters for manipulation. The data layer isn't downstream of the research. It is the research.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at XDOF? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect