Skip to main content
Back to jobs

VLM Research Engineer (m/f/d)

External
deltia logoDeltia · Onsite Berlin
Full-timeOn-site6mo ago
Computer VisionDeep LearningMachine LearningPythonPyTorchRobotics
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Requirements

  • Publications at top-tier venues (CVPR, ICCV, ECCV, NeurIPS, ICLR, etc.) on video, multimodal learning or scene understanding
  • Experience with 3D/4D scene representations, action generation or embodied / sense-plan-act style projects
  • Inference optimisation: quantisation, TensorRT, model distillation, or deployment on constrained hardware
  • Prior experience in a startup or applied research lab environment

Benefits

Employee Share Options Program for all permanent employees*An increasing benefits list: currently includes Urban Sports club and quarterly team retreats.Be on the forefront in defining what artificial intelligence means in manufacturingGain hands-on experience in working in an AI-first software companySupportive and inclusive culture that values diversity and promotes the advancement of underrepresented groups within the companyCollaborate with a diverse (currently more than 10 nationalities) and talented team, working on cutting-edge projects with real-world impactNetwork with professionals and leaders in the field, opening doors to potential future career opportunitiesWe have a very flat hierarchy, open 360° feedback, and flexible working hoursEthics⚖: We are committed to developing ethical AI softwareDon't meet all the requirements?*Only full-time, permanent roles are eligible for stock options. Part-time roles, contract roles, work-student, internships and freelance roles are not eligible for stock options.Vision insuranceFlexible scheduleEquity / stock options

Additional Information

We're looking for a Research Engineer to push the limits of vision-language models for real-world video understanding. You'll work on applied, state-of-the-art multimodal models and turn them into production pipelines used by customers. Your role Design and adapt vision-language and video models for scene understanding, temporal reasoning and activity / action recognition Build and maintain large-scale training and evaluation pipelines on GPU clusters Curate and augment video-text and action datasets, including synthetic labels and retrieval-based augmentation Develop robust benchmarks for video QA, instruction following and temporal understanding, and use them to drive iterative model improvements Cut and refactor model architectures for efficiency and deployability (compression, pruning, distillation) Deliver production-ready inference pipelines to product and customer teams, working closely with CV, platform and robotics engineers You bring Completed PhD (or equivalent research track record) in computer vision, machine learning, robotics or a related field Strong background in video-centric deep learning: scene understanding, temporal / activity / action recognition, or video generation Experience training and adapting large vision or VLM models (e.g. InternVL, Qwen-VL, DeepSeek-VL, similar stacks) Proven work with multi-GPU training (PyTorch, distributed, mixed precision) and large-scale datasets Solid engineering habits: clean Python, reproducible experiments, reliable data and training pipelines Track record of moving research into usable systems (demos, internal tools, or productised features) in fast-moving teams


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at deltia? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect