Computer Vision Researcher (VLM)
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Architect Semantic Grounding: Lead research into cross-modal grounding that connects 3D spatial features with language embeddings, enabling the LGM to "understand" object relationships and environmental context.
- Scale "Understand" Capabilities: Develop and deploy algorithms for continuous semantics , allowing our 3D maps to evolve and improve their situational awareness as new ground-level and aerial data is ingested.
- Agentic Frameworks: Build the "spatial brain" for Embodied AI, enabling robots, Drones and other Machines to move beyond simple navigation to mission-level reasoning.
- Multimodal Benchmarking: Define the standards for measuring "spatial common sense" in LLMs, creating evaluations that test a model's ability to interpret and operate within complex 3D scenes.
- Technical Mentorship: Serve as the technical anchor for the London R&D hub, resolving architectural disagreements and mentoring the next generation of researchers in the fusion of 3D CV and NLP.
- Collaborative Innovation: Partner with Product leads to ensure the "Understand" API delivers high business value for enterprise customers in robotics, logistics, and field operations.
- Required Qualifications:
- Education: PhD (or equivalent) in Computer Vision, Machine Learning, or Robotics with a focus on Multimodal/Semantic understanding.
- Years of Experience: 4+ years of experience in ML research, with a proven track record of shipping models that bridge 3D Vision and Language .
- Technical Depth: Expert knowledge of 3D Geometry (SfM, SLAM, VPS) and Transformer-based architectures (VLMs).
- Research Impact: Multiple first-author publications at top-tier venues (CVPR, NeurIPS, ICLR) focusing on VLMs, scene understanding or semantic segmentation.
- Implementation Mastery: Ability to write production-quality research code in PyTorch or JAX and manage large-scale data pipelines.
- Required In-Office Days: 3 days per week
- Plus If:
- Experience with Gaussian Splatting or NeRFs for semantic scene representation.
- Background in robotics (ROS) or building agentic systems that interact with physical environments.
- Experience with "open-set" recognition and Zero-Shot learning.
- Candidate Privacy Policy
- I understand that by submitting my job application, the information I provide as part of that application will be used in accordance with Niantic Spatial's Privacy Notice for Job Applicants and Candidates .
Benefits
Additional Information
At Niantic Spatial, we're building the future of geospatial AI. Powered by a proprietary database of over 30 billion posed images and a groundbreaking third-generation digital map, our mission is to develop spatial intelligence that helps both humans and machines better understand, navigate, and engage with the physical world. Our high-fidelity mapping technology unlocks a new dimension of interaction-laying the foundation for AI to truly comprehend and operate within real-world environments. Join us as we build a living model of the world that people and machines can talk to. As a Computer Vision Researcher with experience in Large Language Models (LLMs), you will bridge the gap between 3D computer vision LLMs, creating a unified framework where machines can reason about their surroundings. By linking spatial geometry directly to language, you will enable our systems to perform context-aware navigation and answer complex, open-ended questions about the physical world.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at niantic-spatial? Share your experience