Skip to main content
Back to jobs

Product Data Lead

External
impact logoImpact · Cape Town, South Africa
Full-timeOn-site3w ago
ClassificationComplianceFeature EngineeringLeadership
Cover LetterConnect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role


About the role

We're seeking a Lead Data Scientist specializing in Product Data Quality to join our Cape Town Data Science team. This role combines deep individual contribution with meaningful technical leadership across the team. You'll own the most complex, highest-leverage work in product data quality: spanning catalog hygiene, transaction matching, classification modeling, deduplication, and global product identity. You work across both the structured catalog universe and the messier, larger-scale sales transaction universe, building models and infrastructure that power search, recommendations, and business intelligence. Beyond your own delivery, you'll set technical standards, mentor senior scientists, and drive cross-functional alignment in ways that multiply the team's output. This role demands both analytical depth and strong engineering capability. You take models from research to production, build scalable data pipelines, and create monitoring infrastructure that makes the product data foundation trustworthy and continuously improving. You also bring a systems perspective - identifying the architectural and process changes that prevent whole classes of data quality problems from recurring. Your work directly influences search relevance, recommendation quality, match rates, and reporting accuracy across the business. Core Responsibilities Product Classification & Taxonomy Modeling Develop, deploy, and maintain ML models for automated product categorization and taxonomy assignment across hierarchical category structures - owning the most architecturally complex modeling challenges in the domain. Drive step-change improvements in classification accuracy through advanced feature engineering (text, attributes, embeddings), model architecture decisions, and rigorous evaluation on both catalog and sales transaction data. Define production monitoring standards for classification models; establish drift detection patterns, retraining triggers, and quality SLOs that others on the team adopt. Act as the technical authority on taxonomy edge cases, new product types, and evolving category structures; collaborate with category experts and Product teams to shape taxonomy definitions. Catalog & Sales Universe Data Quality Lead deep-dive analyses into catalog completeness, consistency, and correctness across retailers, categories, and product attributes. Own data quality analytics for the sales transaction universe - a larger, messier dataset than catalog - measuring match rates, diagnosing systemic gaps, and identifying root causes of unmatched transactions and misattributed products. Define and evolve the canonical KPI framework for catalog and transaction health (attribute coverage, schema compliance, match rates, GPID coverage, freshness); build and maintain the monitoring systems that make quality trends visible and actionable. Drive remediation of systemic quality failures; translate findings into cross-functional recommendations that produce durable process and data improvements. Global Product ID (GPID) Coverage & Matching Assess and own GPID coverage and accuracy across both catalog and sales transaction data; develop the analytical view of gaps by category, retailer, and brand. Architect and improve matching algorithms to link sales transactions to catalog products, handling missing GPIDs, naming inconsistencies, and category misclassification at scale - combining rule-based, probabilistic, and learned approaches. Quantify the downstream impact of GPID enrichment and matching improvements on search, deduplication, and reporting; use this to drive partner and brand engagement via the Tiger Team. Deduplication & Entity Resolution Lead the design and implementation of deduplication pipelines that handle catalog and transaction data at scale; define the architectural patterns, heuristics, and ML-based approaches for variant grouping and entity resolution. Set the quality bar for precision/recall tradeoffs in duplicate detection; establish evaluation frameworks the br

Benefits

Health insurance

Additional Information

About impact.com impact.com is the world's leading commerce partnership marketing platform, transforming the way businesses grow by enabling them to discover, manage, and scale partnerships across the entire customer journey. From affiliates and influencers to content publishers, brand ambassadors, and customer advocates, impact.com empowers brands to drive trusted, performance-based growth through authentic relationships. Its award-winning products- Performance (affiliate), Creator (influencer), and Advocate (customer referral)-unify every type of partner into one integrated platform. As consumers increasingly rely on recommendations from people and communities they trust, impact.com helps brands show up where it matters most. Today, over 5,000 global brands, including Walmart, Uber, Shopify, Lenovo, L'Oréal, and Fanatics, rely on impact.com to power more than 225,000 partnerships that deliver measurable business results.


Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at impact? Share your experience

Interested in this role?

Apply on the company's website.

Cover LetterConnect