Senior AI Engineer - Pre-training Data (f/m/d)
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
As a Senior AI Engineer in Pre-training Data, you will work across the full stack of data preparation - from sourcing and acquisition to processing, filtering, and mixture design. Some weeks you'll be deep in data quality analysis, understanding what makes a corpus valuable and how its composition affects downstream performance on public and bespoke evaluation tasks. Other weeks you'll be optimising large-scale processing pipelines or building tooling that gives the team visibility into what our models are actually training on. And some weeks you'll be reading the latest research on pre-training data methods, translating findings into experiments you can run against our stack. We approach data work in an evidence-based way. Decisions about filtering strategies, data mixtures, and quality thresholds are backed by ablations - you'll design and run targeted experiments to validate that your data choices actually improve model outcomes. We are looking for someone that combines significant research experience (in industry or academia) with high engineering competence. Your work sits at high leverage: the data you source, curate and synthesize directly determines what our models learn, how well they perform, and where they fall short. You'll have direct influence on the models we ship.