Senior Software Engineer - Real-Time Ingestion
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
Our platform is the foundational identity and data layer for 900M+ monthly active users, serving 2.5B+ profiles at massive scale. We are building a predictive, identity-centric insights engine-ensuring our audience is understood with precision to deliver hyper-personalized experiences and advertising solutions across all our digital properties. Our mission centers on first-party data strategy: capturing, enriching, and activating audience signals to build a 360-degree view of every user. We operate under a Privacy-by-Design philosophy, adhering to global regulations (GDPR, CCPA) and industry security standards, while leveraging a cloud-native stack across GCP (BigQuery, Spanner, Dataflow, Composer, GKE) and AWS, with modern MLOps practices to deliver measurable business impact. As a Senior Data Engineer on the Consumer Data Organization(CDO), you will design and implement streaming data pipelines that process billions of user signals daily, maintaining a real-time view of 2.5B+ profiles. Your pipelines handle critical third-party ID mutations, behavioral signals, and identity updates with sub-second latency, ensuring data freshness for downstream activation and monetization use cases worth hundreds of millions in annual revenue. You will build scalable Kafka-based streaming infrastructure processing millions of events per second, implementing Apache Beam/Dataflow jobs for stream processing, enrichment, and validation. Your work requires balancing extreme throughput requirements, data quality guarantees, and operational reliability while ensuring privacy-compliant handling of sensitive user data. This role demands expertise in real-time streaming architectures, distributed messaging systems (Kafka, Pub/Sub), and production data engineering at massive scale. You will collaborate closely with Storage, Privacy, and Platform teams to ensure efficient data flow from ingestion to activation.