164. Data Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
We're looking for a Data Engineer to help build and operate an AWS-native data platform processing healthcare claims data and tokenized identifiers. You'll design and implement Spark-based pipelines that transform, intersect, and enrich tokenized datasets stored primarily as Parquet on S3, queried via Athena and related AWS services. This environment intentionally avoids managed lakehouse platforms (e.g., no Databricks and no Snowflake)-you'll be doing "real" data engineering directly on AWS.
Responsibilities
- Build and maintain Spark pipelines to process large-scale Parquet datasets on S3.
- Implement tokenization workflows, including transit token → real token conversion and dataset intersection/join logic.
- Process and deliver healthcare claims datasets for matched individuals, ensuring accurate identity mapping and data integrity.
- Orchestrate data pipelines using Airflow and/or AWS-native orchestration tools when appropriate.
- Develop reliable, testable, and observable ETL/ELT processes (retries, idempotency, monitoring, reprocessing).
- Optimize performance and cost across Spark jobs, S3 partitioning/layout, and Athena query patterns.
- Contribute to dbt models when applicable (transformations, documentation, data quality checks).
- Collaborate with cross-functional stakeholders in a healthcare environment, with a strong focus on privacy and secure data handling.
- Required Qualifications
- 1 -2 years of professional experience in Data Engineering.
- Strong experience with Apache Spark (PySpark or Scala), including joins, intersections, partitioning, and performance tuning.
- Strong hands-on experience with the AWS data stack , including:
- Amazon S3 (Parquet datasets, partition strategies, data layout best practices)
- Amazon Athena (SQL, query optimization, managing large datasets)
- Familiarity with AWS-native data lake patterns (Glue Catalog, Lake Formation concepts are a plus)
- Experience building and operating pipelines using Airflow (DAGs, scheduling, dependencies, backfills).
- Excellent SQL skills and solid data modeling fundamentals.
- Advanced English level: able to lead technical discussions, write clear documentation, and work directly with US-based stakeholders.
Requirements
- Experience with dbt (core, tests, documentation, exposures).
- Familiarity with healthcare data (claims data, eligibility, member-level datasets).
- Experience with tokenization, identity resolution, or privacy-preserving data workflows.
- Knowledge of AWS security concepts such as IAM, KMS, encryption, and secure data handling .
- Experience running Spark on AWS (e.g., EMR) or Spark-on-containers architectures.
- Tech Stack
- AWS-native architecture
- Amazon S3 + Parquet (core storage layer)
- Amazon Athena (query engine)
- Apache Spark (no Databricks)
- Airflow (orchestration)
- dbt (optional, as applicable)
- Soft Skills
- Strong and empathetic leadership.
- Proven client-facing experience .
- Excellent communication skills .
- Strong expectation management abilities.
- Strategic mindset with a solution-oriented approach and strong decision-making skills.
Benefits
Additional Information
We're looking for a Data Engineer to join Source Meridian. About Source Meridian Source Meridian is a development software company that works to solve the industry's most challenging problems in healthcare practices. We are laser focused on specific technologies in the healthcare and life science industries: Healthcare technology, artificial intelligence, and healthcare interoperability.
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at sourcemeridian? Share your experience