Skip to main content
Back to jobs

Solutions Applied Data Scientist, Healthcare

External
protege logoProtege · Worldwide
Full-timeRemote1mo ago
AssemblyLeanSQL
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Technical Escalation & Delivery Collaboration
  • During delivery projects, Solutions Leads may encounter complex data challenges that require deeper analysis or technical problem-solving. You will act as a technical partner , helping solve things such as:
  • Complex cohort definitions that require multi-source joins
  • Linking datasets across different data partners
  • Investigating unexpected gaps or anomalies in delivered data
  • Evaluating whether requested variables or labels exist in available datasets
  • Determining whether a dataset can realistically satisfy model requirements
  • You will work collaboratively with Solutions Leads to unblock delivery challenges while keeping projects moving toward successful completion.
  • When solutions require infrastructure or pipeline changes, you will partner with the Solutions Engineer and internal platform engineering teams to implement the required workflows.
  • Cohort Definition & Dataset Construction
  • Work with Solutions Leads to translate customer requirements into concrete dataset logic. You will help ensure that datasets accurately represent the intended population and meet customer specifications.
  • Responsibilities include:
  • Writing complex SQL queries to construct cohorts
  • Implementing inclusion and exclusion logic
  • Joining datasets across multiple data sources
  • Validating linkage between datasets
  • Identifying and resolving inconsistencies or missing fields
  • Partner with Solutions Leads to resolve complex data questions that arise during project delivery
  • Escalate or collaborate with delivery engineers when dataset construction requires pipeline changes or large-scale data processing
  • Data Quality Validation & Completeness Analysis
  • Responsibilities include:
  • Performing data completeness analysis
  • Investigating missing or anomalous data
  • Verifying cohort logic results
  • Validating row counts and dataset structure
  • Creating summary statistics and validation outputs
  • Data Feasibility
  • Many customer projects involve AI researchers who are defining the healthcare datasets required to train or evaluate models. You will work with these customer teams to translate research goals into practical dataset specifications.
  • Responsibilities include:
  • Reviewing dataset requests from AI researchers and model development teams
  • Helping clarify and re

Benefits

Health insurance

Additional Information

Company Overview: We are building Protege to solve the biggest unmet need in AI - getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data. Solving AI's data problem is a generational opportunity. We're backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI - and in tech. We're a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI. Role Overview We are hiring a Solutions Applied Data Scientist to help design, construct, and validate complex healthcare data cohorts used for AI model training. This role sits within the delivery organization , working closely with Solutions Leads and delivery engineers to solve complex data challenges that arise during customer projects. Solutions Leads own the customer relationship and overall delivery of projects. The Solutions Applied Data Scientist serves as their technical partner for more complex data problems , including cohort construction, multi-source dataset assembly, feasibility analysis, and data validation. You will help translate research generated by Protege's Data Lab and customer requirements into practical dataset definitions, determine whether those requirements can be met with available data, and build the SQL and analysis needed to construct the resulting datasets. You will also collaborate with delivery engineers when solutions require changes to data pipelines, infrastructure, or large-scale data movement. This is a highly applied role focused on solving real-world dataset challenges , not research or model development. The ideal candidate is someone who enjoys solving messy real-world data problems, working directly with large healthcare datasets, writing complex SQL and collaborating closely with cross-functional teams. Our environment has a lot going on as we grow - so we're looking for someone energized by and excited by the fast pace of the industry and our company!


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at protege? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect