Skip to main content
Back to jobs

Senior Data Scientist, Product Data

External
impact logoImpact · Cape Town, South Africa
Full-timeOn-site4mo ago
ClassificationClusteringComplianceFeature Engineering
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

We're seeking a Senior Data Scientist specializing in Product Data Quality to join our Cape Town Data Science team. In this role, you'll own the analytical and technical foundation of product data quality across our ecosystem-spanning catalog hygiene, transaction matching, classification modeling, deduplication, and global product identity. You'll work across both the structured catalog universe and the messier, larger-scale sales transaction universe, building models and infrastructure that power search, recommendations, and business intelligence. This is a high-impact role that demands both analytical depth and strong engineering capabilities: you'll take models from research to production, build scalable data pipelines, and create the monitoring infrastructure that makes our product data foundation trustworthy and continuously improving. Your work will directly influence search relevance, recommendation quality, match rates, and reporting accuracy across the business. Core Responsibilities Product classification & taxonomy modeling Develop, deploy, and maintain ML models for automated product categorization and taxonomy assignment across hierarchical category structures. Improve classification accuracy through feature engineering (text, attributes, embeddings), model iteration, and robust evaluation on both catalog and sales transaction data. Monitor production model performance; identify and remediate misclassification patterns that impact search, recommendations, and reporting. Collaborate with category experts and Product teams to refine taxonomy definitions, handle edge cases, and adapt to new product types. Catalog & sales universe data quality Conduct deep-dive analyses into catalog completeness, consistency, and correctness across retailers, categories, and product attributes. Own data quality analytics for the sales transaction universe -a larger, messier dataset than catalog-measuring match rates, diagnosing gaps (unmatched transactions, misattributed products), and identifying systematic failures. Define and track catalog and transaction health KPIs (attribute coverage, schema compliance, match rates, GPID coverage, freshness); identify root causes and drive remediation. Build monitoring systems and dashboards to track data quality trends across retailers, categories, and time periods. Global Product ID (GPID) coverage & matching Assess GPID (GTIN/UPC/EAN) coverage and accuracy across both catalog and sales transaction data ; identify gaps by category, retailer, and brand. Build and improve matching algorithms to link sales transactions to catalog products, handling missing GPIDs, naming inconsistencies, and category misclassification. Quantify the impact of GPID enrichment and matching improvements on search, deduplication, and reporting accuracy. Partner with external data providers and brands to improve GPID coverage and resolve identifier conflicts. Deduplication & entity resolution Identify product variants (size, color, packaging) and duplicates within and across retailer catalogs using clustering, entity resolution, embeddings, and similarity-based techniques. Build scalable deduplication pipelines that handle catalog and transaction data at scale; define patterns, heuristics, and ML-based approaches for variant grouping. Measure the impact of deduplication on search quality, recommendation accuracy, and reporting; iterate on models to reduce false positives and improve precision. Support Data Engineering and Platform teams in productionizing deduplication and entity linking infrastructure. Manufacturer data quality & brand engagement Evaluate the consistency and accuracy of manufacturer-level attributes (brand name, MPN, manufacturer identifiers) across catalogs and transactions. Detect systemic issues at the brand and retailer level; build scorecards and engage brands (via the Tiger Team) to drive data quality improvements. Create feedback loops to measure manufacturer data quality and track progress on remediation initiativ

Benefits

Health insurance

Additional Information

About impact.com impact.com is the world's leading commerce partnership marketing platform, transforming the way businesses grow by enabling them to discover, manage, and scale partnerships across the entire customer journey. From affiliates and influencers to content publishers, brand ambassadors, and customer advocates, impact.com empowers brands to drive trusted, performance-based growth through authentic relationships. Its award-winning products- Performance (affiliate), Creator (influencer), and Advocate (customer referral)-unify every type of partner into one integrated platform. As consumers increasingly rely on recommendations from people and communities they trust, impact.com helps brands show up where it matters most. Today, over 5,000 global brands, including Walmart, Uber, Shopify, Lenovo, L'Oréal, and Fanatics, rely on impact.com to power more than 225,000 partnerships that deliver measurable business results.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at impact? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect
Senior Data Scientist, Product Data at Impact