Data Scientist

External

Thenielsencompany · Bengaluru, IN

Full-timeOn-site1w ago

AirflowAWSDockerETLFeature EngineeringGCP

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Responsibilities

Advanced Statistical Modeling (The "Science" Side)
Incremental Reach Frameworks: * Small-N Datasets: Implement Bayesian Model Averaging (BMA) to cycle through regression combinations, providing robust coefficients and credible intervals when study data is limited.
Large-Scale Prediction: Deploy Gradient Boosted Regression Trees (GBM) to identify non-linear patterns and rank the impact of "Reach Drivers" (Media Weight, On-Target %, Frequency).
Audience Deduplication: Use Maximum Entropy (MaxEnt) models to estimate unique audience reach across fragmented platforms by reconciling census and panel data.
Additional Frameworks:
Mixed-Effect Models: Use Hierarchical/Multilevel modeling to account for nested data (e.g., campaigns nested within specific industry verticals).
Causal Lift: Apply Synthetic Control Methods to measure incremental shifts in behavior for campaigns with fixed timeframes where a clean control group is unavailable.
Data Engineering & Pipeline Architecture (The "Engineering" Side)
Python-Centric ETL: Architect and maintain robust data pipelines using Python (Pandas, PySpark) to ingest, clean, and harmonize data from Linear TV logs and Digital ad servers.
Feature Engineering: Automate the extraction of Base Drivers (GRP, Reach Efficiency, Seasonality) and Custom Drivers (Share of Voice, Flighting) into a supervised learning-ready schema.
Productionization: Wrap statistical models into production-grade APIs or scheduled containers (Docker/Airflow) to ensure repeatable and scalable measurement.
Cloud Operations: Manage large-scale datasets within Cloud Data Warehouses (Snowflake, AWS, or GCP), optimizing SQL queries for high-performance analytics.
Experimental Design & Methodology
Control/Test Logistics: Design scientifically valid Control and Test groups, ensuring proper randomization or using Propensity Score Matching to mitigate selection bias.
Variable Importance: Provide stakeholders with Posterior Inclusion Probabilities to identify which media levers (Duration, Weight, etc.) most consistently drive incremental reach.
Cross-Media Calibration: Reconcile Linear TV's "One-to-Many" metrics with Digital's "One-to-One" tracking to provide a unified view of the consumer.
Experience: 3-6 years of statistical model development and Mastery of Python (specifically for data manipulation and ML) and advanced SQL. Experience with PySpark or Dask for distributed computing is a plus.
Statistical Mastery: Proven experience with GBM (XGBoost/LightGBM) and Bayesian Frameworks (e.g., PyMC, Stan, or R-BMA) among other Data Science models.
Media Knowledge: Understanding of Linear TV vs. Digital dynamics, including Reach/Frequency, GRPs, and Deduplication logic.
Education: Bachelor's or Master's in a quantitative field (Statistics, Computer Science, Economics) or equivalent professional experience.

Additional Information

Role Overview As a Hybrid Data Scientist you will sit at the intersection of high-scale data pipelining and advanced statistical methodology. You will be responsible for the end-to-end lifecycle of Incremental Reach and Audience Measurement products-from architecting Python-based data pipelines to implementing sophisticated Bayesian and Machine Learning models that quantify the lift of Digital media over a Linear TV baseline.

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at Thenielsencompany? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect