AI Engineer (Vision)

External

Whitecircle · Paris, France

$100K–$250K/yrFull-timeRemote5mo ago

DatadogLLMsPyTorch

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

White Circle is an AI Safety company building the safety, reliability, and optimization layer for AI systems. At the core of our platform are policies - simple natural-language rules that define what an AI model should and shouldn't do. We automatically test, enforce, and continuously improve these policies at scale. We've raised $11M from top funds, founders, and senior leaders at OpenAI, Anthropic, HuggingFace, Mistral, DeepMind, Datadog, Sentry, and others We process over one hundred million API calls every month We fine-tune and train our own LLMs so they run faster and cheaper than any open or proprietary model We're a small, highly focused team. If you want to work deeply on hard problems, see your work ship to production quickly, and influence how AI safety is actually built - you're the one we need. You will: Train vision-language models from scratch and fine-tune existing architectures for image understanding Extend VLM capabilities to video: design temporal modeling approaches, handle long-context efficiently Design evaluation benchmarks that matter: visual QA, spatial reasoning, video comprehension Curate and maintain multimodal datasets - including synthetic data generation pipelines Train and optimize MoE architectures for efficient multimodal inference Deploy models to production: quantization, batching strategies, latency optimization You'll fit right in if you: 3+ years training and fine-tuning vision-language models (LLaVA, Qwen-VL, InternVL, or similar) Deep experience with multimodal architectures - you understand how vision encoders, projectors, and LLMs fit together Hands-on with RLHF/alignment for multimodal: GRPO, DPO, reward modeling - not just for text Experience with video understanding: temporal modeling, long-context processing, efficient attention mechanisms Track record shipping VLMs to production: you've optimized inference, not just reported benchmark scores Comfortable with large-scale dataset curation: image-text pairs, video-instruction data, synthetic data generation Familiar with MoE architectures and their tradeoffs for multimodal workloads Strong PyTorch skills, experience with distributed training (DeepSpeed, FSDP) Why White Circle Salary of $100,000 to $250,000 + equity Paid time off in line with your local regulations, no matter where you work from Work from Paris (hybrid) + relocation package Best medical insurance in France All the hardware, tools, and services you need Covered subscriptions for AI agents and IDEs Team off-sites twice a year: we've recently been to the Alps and to Saint-Tropez How we hire Intro call with one of our colleagues Сomplete the take-home assignment Show your best during the technical interview Final call with our CEO and CTO Please submit your application in English - it's our company language so you'll be speaking lots of it if you join

Benefits

Vision insuranceEquity / stock options

Additional Information

TLDR: You'll train and fine-tune vision-language models, extend them to video, build alignment pipelines (GRPO, DPO, reward modeling), develop evaluation benchmarks, optimize inference for production, and work with MoE architectures.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at whitecircle? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect