Senior Applied Data Scientist, Retrieval and Semantic Systems

External

Instructure · Budapest, Hungary

Full-timeRemote2w ago

AWSFastAPIMachine LearningMLOpsObservabilityPython

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Design, build, and ship production retrieval systems that power AI product capabilities across multiple products
Own vector store selection and operation, including scalability, latency, reliability, cost, and multi-tenant design
Build and operate indexing and refresh pipelines: chunking, embedding generation, backfills, deletes, and versioned indices
Implement semantic and hybrid retrieval: embeddings, similarity search, lexical and vector combination, metadata filtering, and reranking
Own retrieval evaluation as a first-class system: gold sets, offline relevance metrics, slice analysis, drift detection, and regression gates that block bad changes from shipping
Make and defend the core tradeoffs of the domain: relevance against latency against cost against operational complexity
Partner with AI Platform and infrastructure engineers on deployment, observability, and reliability, and with product and research partners on relevance requirements

Requirements

6+ years of experience building and shipping production machine learning or applied AI systems
Proven ownership of a retrieval system in production, including vector store selection and operation
Strong Python skills and experience building services and APIs (for example, FastAPI or similar)
Solid grounding in embeddings, approximate nearest neighbor search, and retrieval and ranking systems
Experience designing indexing and refresh strategies, with data quality controls and safe backfills
Demonstrated ability to define and run retrieval evaluation: building gold sets, choosing relevance metrics, analyzing failures by slice, and preventing regressions
Strong tradeoff judgment across relevance, latency, cost, and operational complexity
It Would Be a Bonus If You Had
Experience with hybrid retrieval (lexical and vector), learning to rank, or domain-specific reranking
Experience integrating graph-structured context or knowledge graphs into retrieval
Experience with evaluation and observability for LLM and retrieval systems, including drift, failure analysis, and regression prevention
Experience with AWS-native retrieval and indexing architectures
Experience in edtech, content, curriculum, or skills modeling
Growth & Impact - In This Role, You'll Be Expected To
Why Join Us
Join us and help shape the future of education by turning cutting-edge AI into reliable product capabilities.
We value div

Benefits

Performance bonus

Additional Information

At Instructure , we believe in the power of people to grow and succeed throughout their lives. Our goal is to amplify that power by creating intuitive products that simplify learning and personal development, facilitate meaningful relationships, and inspire people to go further in their education and careers. We do this by giving smart, creative, passionate people opportunities to create awesome. And that's where you come in: Our team builds AI-native capabilities, reusable AI systems, and shared infrastructure that power multiple products and workflows across the platform. We are looking for a Senior Applied Data Scientist to own retrieval and semantic systems end to end, as a core, reusable capability that multiple AI products depend on. You will own the full retrieval vertical: vector store selection and operation, indexing and refresh pipelines, semantic and hybrid retrieval, reranking, and the evaluation systems that prove relevance is good and stays good. You will own retrieval-specific architecture and its day-to-day operation, while our infrastructure owner provides the underlying cloud, cluster, and CI substrate and our AI Platform engineers provide the general MLOps and service scaffolding you build on. You will work closely with product, engineering, and research partners to turn advanced AI ideas into reliable product capabilities used at scale. Important note on scope: This is a deep individual-contributor specialist role. We are looking for someone who has owned a retrieval system in production, not someone who has only used a vector database in a prototype. Retrieval evaluation is central to this role: if you cannot measure relevance and catch regressions before they reach users, the system is not done.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at instructure? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect