Skip to main content
Back to jobs

Synthetic Data Engineer (AI Data/Training)

External
hyphenconnect logoHyphenconnect · Boston
Full-timeOn-site1mo ago
AirflowPrompt EngineeringSpark
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


Responsibilities

  • Design domain-specific synthetic data generation (SDG) pipelines via self-instruct and constitutional prompting.
  • Implement automated quality scoring and de-duplication systems.
  • Manage data pipelines that feed directly into SFT and DPO training loops.

Requirements

  • Proven experience building large-scale data pipelines (Airflow, Spark, Ray).
  • Deep knowledge of prompt engineering for data generation.
  • Familiarity with dataset distillation and bias mitigation.

Additional Information

We are seeking a talented and innovative Synthetic Data Engineer. In this role, you will design and implement domain-specific synthetic data generation pipelines, ensuring high-quality data management for training loops. Your expertise will drive the success of data processing and model training within the organization.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at hyphenconnect? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect