Senior Applied Data Scientist, Retrieval and Semantic Systems
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Responsibilities
- Design, build, and ship production retrieval systems that power AI product capabilities across multiple products
- Own vector store selection and operation, including scalability, latency, reliability, cost, and multi-tenant design
- Build and operate indexing and refresh pipelines: chunking, embedding generation, backfills, deletes, and versioned indices
- Implement semantic and hybrid retrieval: embeddings, similarity search, lexical and vector combination, metadata filtering, and reranking
- Own retrieval evaluation as a first-class system: gold sets, offline relevance metrics, slice analysis, drift detection, and regression gates that block bad changes from shipping
- Make and defend the core tradeoffs of the domain: relevance against latency against cost against operational complexity
- Partner with AI Platform and infrastructure engineers on deployment, observability, and reliability, and with product and research partners on relevance requirements
Requirements
- 6+ years of experience building and shipping production machine learning or applied AI systems
- Proven ownership of a retrieval system in production, including vector store selection and operation
- Strong Python skills and experience building services and APIs (for example, FastAPI or similar)
- Solid grounding in embeddings, approximate nearest neighbor search, and retrieval and ranking systems
- Experience designing indexing and refresh strategies, with data quality controls and safe backfills
- Demonstrated ability to define and run retrieval evaluation: building gold sets, choosing relevance metrics, analyzing failures by slice, and preventing regressions
- Strong tradeoff judgment across relevance, latency, cost, and operational complexity
- It Would Be a Bonus If You Had
- Experience with hybrid retrieval (lexical and vector), learning to rank, or domain-specific reranking
- Experience integrating graph-structured context or knowledge graphs into retrieval
- Experience with evaluation and observability for LLM and retrieval systems, including drift, failure analysis, and regression prevention
- Experience with AWS-native retrieval and indexing architectures
- Experience in edtech, content, curriculum, or skills modeling
- Growth & Impact - In This Role, You'll Be Expected To
- Why Join Us
- Join us and help shape the future of education by turning cutting-edge AI into reliable product capabilities.
- We value div
Benefits
Additional Information
At Instructure , we believe in the power of people to grow and succeed throughout their lives. Our goal is to amplify that power by creating intuitive products that simplify learning and personal development, facilitate meaningful relationships, and inspire people to go further in their education and careers. We do this by giving smart, creative, passionate people opportunities to create awesome. And that's where you come in: Our team builds AI-native capabilities, reusable AI systems, and shared infrastructure that power multiple products and workflows across the platform. We are looking for a Senior Applied Data Scientist to own retrieval and semantic systems end to end, as a core, reusable capability that multiple AI products depend on. You will own the full retrieval vertical: vector store selection and operation, indexing and refresh pipelines, semantic and hybrid retrieval, reranking, and the evaluation systems that prove relevance is good and stays good. You will own retrieval-specific architecture and its day-to-day operation, while our infrastructure owner provides the underlying cloud, cluster, and CI substrate and our AI Platform engineers provide the general MLOps and service scaffolding you build on. You will work closely with product, engineering, and research partners to turn advanced AI ideas into reliable product capabilities used at scale. Important note on scope: This is a deep individual-contributor specialist role. We are looking for someone who has owned a retrieval system in production, not someone who has only used a vector database in a prototype. Retrieval evaluation is central to this role: if you cannot measure relevance and catch regressions before they reach users, the system is not done.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at instructure? Share your experience