Build and maintain the ingestion pipelines - Apache Flink streaming jobs on Dataproc for HL7v2 and FHIR feeds, PySpark batch jobs for CCDA and CSV bulk loads. Implement against the CDM_ingest_mapping shared Python library that defines SourceSpec instances per source × format combination
Implement source-format parsers (HL7v2, CCDA, CSV, FHIR) as Python classes per the parser component spec. Write fixture-driven tests covering both well-formed inputs and DLQ-routing scenarios
Implement and maintain the synchronous Informatica MDM call pattern in the ingestion path - batched calls, timeout handling, circuit breaker behavior, DLQ routing for MDM failures. Implement the asynchronous MDM event consumer that applies ECI changes to CDM
Implement and maintain the shared dbt macro library - hash_key, scd2_merge, attribute_hash, restate_merge, audit_columns. Macros are the most-reused code; their correctness is non-negotiable and they require golden tests
Build the FHIR serialization layer - flat FHIR Iceberg tables (one per resource type) materialized via dbt, the PySpark bundling pipeline that produces FHIR Bundles for Kafka publication, and the FHIR validator integration that gates publication on US Core 6.1 conformance
Implement Cloud Composer DAGs to orchestrate dbt runs, batch ingestion jobs, maintenance operations (Iceberg compaction, snapshot expiration, orphan file cleanup), and data product refresh schedules
Implement and monitor data quality checks at every layer - DBT tests for staging and CDM, FHIR validator output for serialization, Iceberg metadata observations for storage health, freshness monitors at the source-to-CDM boundary
Participate in code reviews, on-call rotations, and incident response. The platform serves regulated healthcare workloads; operational stability is part of the engineering responsibility, not a separate function
Optimize pipeline performance - Flink TaskManager sizing, Iceberg compaction tuning, dbt incremental strategy selection, Starburst cluster scaling decisions. Profile production performance and propose changes when SLOs are at risk
Document the components you build through the SDD framework - every code file references its spec; every spec change is reviewed; every acceptance criterion has a corresponding test
Required Skills and Qualifications
Bachelor's or Master's degree in Computer Science, Engineering, or a related quantitative field
3+ years of hands-on data engineering experience
Strong proficiency in Python and SQL . PySpark and PyFlink familiarity strongly preferred. Some Java or Kotlin exposure useful for FHIR-Repository interceptor work (one o
Benefits
Health insurancePaid time off
Additional Information
While technology is the heart of our business, a global and diverse culture is the heart of our success. We love our people and we take pride in catering them to a culture built on transparency, diversity, integrity, learning and growth.
If working in an environment that encourages you to innovate and excel, not just in professional but personal life, interests you- you would enjoy your career with Quantiphi!
Data Engineer
Exp Range : 4 - 8 Years
Location : Mumbai , Bangalore, Trivandrum
Role review :
The Data Engineer is the implementation backbone of the platform. You will build and operate the ingestion pipelines, the dbt transformation layer, the FHIR serialization pipelines, the FHIR-Repository integration components, and the data products that depend on them. You will work within a spec-driven development framework - every component begins with an approved specification, and your work is generated, refined, and verified against that specification.
This is a hands-on role. You will write Python, PyFlink, PySpark, dbt SQL, and supporting Java for FHIR-Repository interceptor work, with Code agents as a coding assistant under your direction. Code review, test authorship, and operational ownership of the components you build are part of the role. The platform handles regulated healthcare data at high volume, and operational discipline matters.