Additional Information
Apify is the largest marketplace of tools for AI. 30,000+ Actors helping people and agents get real-time web data, track competitors, generate leads, or integrate their apps. Actors are built by a global creator community that now earns more than $1M every month.
Join us to help people put the web to work. Apify can find missing children , protect consumers from fake discounts across the EU , and feed data to AI chatbots .
We're looking for a Data Engineer to own the integration layer between Snowflake and the operational tools that run Apify's go-to-market and product motion: HubSpot, Intercom, Mixpanel, and Segment. You'll make sure the right data lands in the right system at the right time, with the right shape, so Sales, Marketing, Customer Success, and Product teams can act on it.
You'll be the 9th member of the data team - joining a mix of analytical engineers, analysts, and data scientists - at the moment Segment is being rolled out as Apify's CDP. That's yours to land end-to-end.
What you'll be working on:
Own the integration domain end to end - all pipelines, transformations, and Snowflake models that connect HubSpot, Intercom, Mixpanel, and Segment to the rest of the platform, in both directions.
Design event tracking and the CDP layer with the RevOps team as Segment becomes the source of truth for behavioral data flowing into product, marketing, and CRM systems.
Build reliable, observable pipelines in Keboola and dbt - with clear data contracts, schema tests, freshness guarantees, and alerting.
Model integration data in Snowflake so HubSpot, Intercom, Mixpanel, and Segment data lands in well-defined tables that downstream consumers can trust, with documentation that analysts and scientists can actually use.
Power lifecycle automations - PQA scores back into HubSpot, behavioral campaigns in Intercom and customer.io , product usage signals - by shipping the data they depend on.
Diagnose and resolve pipeline incidents independently - trace lineage across multiple components, find root causes, fix, and write the runbook so it doesn't bite the next person.
Tech stack
Snowflake - data warehouse
Keboola - extractors, writers, and orchestration
dbt - transformations on Snowflake (orchestrated by Keboola; this is where we're actively migrating existing transformation logic)
Tableau and Redash - BI
n8n - workflow automation
Segment - CDP, currently being rolled out end-to-end
Who we're looking for:
3+ years of data engineering experience, with meaningful time spent on integrations between a cloud warehouse and operational SaaS tools (HubSpot, Salesforce, Intercom, Zendesk, Mixpanel, Amplitude, Segment, RudderStack, or similar).
Fluent in SQL (window functions, CTEs, complex multi-source joins, query optimization) and comfortable in Python for the parts a no-code tool can't handle.
Production experience with Snowflake (or BigQuery, Databricks, Redshift), and an understanding of the cost, performance, and access-control tradeoffs of a usage-based warehouse.
Experience building end-to-end pipelines combining an orchestration or ELT platform (Keboola, Fivetran, Airflow, Dagster, Prefect, Matillion) with a transformation framework like dbt.
Hands-on experience with a CDP (Segment, RudderStack, mParticle) - tracking plans, schemas, identity resolution, downstream consumers - not just installing the snippet.
You think in data contracts - schema stability, freshness SLAs, documented field definitions - and treat the boundary between your domain and downstream consumers as a first-class interface.
Comfortable with reverse ETL (Census, Keboola, or hand-rolled), and you understand what it means to write back to a CRM that humans are also editing.
Pragmatic about tooling - happy to use n8n for the right job, and equally happy to write proper code when that's the right call.
Able to explain why a dashboard moved and what it means to non-technical stakeholders in Sales, Marketing, and Customer Success, in English, both in writing and in person.