164. Data Engineer

External

Sourcemeridian · Medellín, Colombia

Full-timeOn-site2w ago

AirflowApacheAWSData ModelingdbtDocumentation

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

We're looking for a Data Engineer to help build and operate an AWS-native data platform processing healthcare claims data and tokenized identifiers. You'll design and implement Spark-based pipelines that transform, intersect, and enrich tokenized datasets stored primarily as Parquet on S3, queried via Athena and related AWS services. This environment intentionally avoids managed lakehouse platforms (e.g., no Databricks and no Snowflake)-you'll be doing "real" data engineering directly on AWS.

Responsibilities

Build and maintain Spark pipelines to process large-scale Parquet datasets on S3.
Implement tokenization workflows, including transit token → real token conversion and dataset intersection/join logic.
Process and deliver healthcare claims datasets for matched individuals, ensuring accurate identity mapping and data integrity.
Orchestrate data pipelines using Airflow and/or AWS-native orchestration tools when appropriate.
Develop reliable, testable, and observable ETL/ELT processes (retries, idempotency, monitoring, reprocessing).
Optimize performance and cost across Spark jobs, S3 partitioning/layout, and Athena query patterns.
Contribute to dbt models when applicable (transformations, documentation, data quality checks).
Collaborate with cross-functional stakeholders in a healthcare environment, with a strong focus on privacy and secure data handling.
Required Qualifications
1 -2 years of professional experience in Data Engineering.
Strong experience with Apache Spark (PySpark or Scala), including joins, intersections, partitioning, and performance tuning.
Strong hands-on experience with the AWS data stack , including:
Amazon S3 (Parquet datasets, partition strategies, data layout best practices)
Amazon Athena (SQL, query optimization, managing large datasets)
Familiarity with AWS-native data lake patterns (Glue Catalog, Lake Formation concepts are a plus)
Experience building and operating pipelines using Airflow (DAGs, scheduling, dependencies, backfills).
Excellent SQL skills and solid data modeling fundamentals.
Advanced English level: able to lead technical discussions, write clear documentation, and work directly with US-based stakeholders.

Requirements

Experience with dbt (core, tests, documentation, exposures).
Familiarity with healthcare data (claims data, eligibility, member-level datasets).
Experience with tokenization, identity resolution, or privacy-preserving data workflows.
Knowledge of AWS security concepts such as IAM, KMS, encryption, and secure data handling .
Experience running Spark on AWS (e.g., EMR) or Spark-on-containers architectures.
Tech Stack
AWS-native architecture
Amazon S3 + Parquet (core storage layer)
Amazon Athena (query engine)
Apache Spark (no Databricks)
Airflow (orchestration)
dbt (optional, as applicable)
Soft Skills
Strong and empathetic leadership.
Proven client-facing experience .
Excellent communication skills .
Strong expectation management abilities.
Strategic mindset with a solution-oriented approach and strong decision-making skills.

Benefits

✔ Permanent contract ✔ Learning and continuous growth environment 🚀 ✔ Benefits package focused on health and well-being 🎉 ✔ Competitive salary based on experience 💰📍 Apply only if you reside in Colombia or EcuadorAt Source Meridian , you'll be part of a high-impact tech-health company, building products that truly make a difference.If you meet the profile - or know someone who might be interested - apply now!We'd love to meet you 💬Health insurance

Additional Information

We're looking for a Data Engineer to join Source Meridian. About Source Meridian Source Meridian is a development software company that works to solve the industry's most challenging problems in healthcare practices. We are laser focused on specific technologies in the healthcare and life science industries: Healthcare technology, artificial intelligence, and healthcare interoperability.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at sourcemeridian? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect