Synthetic Data Engineer (AI Data/Training)

External

Hyphenconnect · Seattle

Full-timeOn-site1mo ago

AirflowPrompt EngineeringSpark

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Design domain-specific synthetic data generation (SDG) pipelines via self-instruct and constitutional prompting.
Implement automated quality scoring and de-duplication systems.
Manage data pipelines that feed directly into SFT and DPO training loops.

Requirements

Proven experience building large-scale data pipelines (Airflow, Spark, Ray).
Deep knowledge of prompt engineering for data generation.
Familiarity with dataset distillation and bias mitigation.

Additional Information

We are seeking a talented and innovative Synthetic Data Engineer. In this role, you will design and implement domain-specific synthetic data generation pipelines, ensuring high-quality data management for training loops. Your expertise will drive the success of data processing and model training within the organization.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at hyphenconnect? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect