Data Quality Intern (d/f/m)

External

Luminovo · 🇪🇺 Europe (remote)

Full-timeRemote5d ago

PostgreSQLPythonSAFeSQLTypeScript

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

Benefits

Performance bonus

Additional Information

👑 Your opportunity Our software platform is built to solve engineering and procurement issues in the trillion-dollar electronics industry. In other words: Luminovo is redefining the software stack used to bring any type of electronics to live. To understand what we do, check out our website , plus two of our blog posts ( here and here ). 🚀 Your role Our data quality mission is product discovery applied to our part and component data. You take a fuzzy quality problem, figure out what it actually means for customers, measure it honestly, and hand a well-scoped, evidence-backed finding to the team that delivers the larger fix. The hard part isn't running a query (our AI tooling helps with that). It's reframing "x% of parts have no pin count" into "y% of a customer's costings can't complete because of it," then giving other teams a result they can act on without them having to re-check it. You'll be a junior version of this discovery loop: sharp, honest, and data-fluent. You'll follow threads the team doesn't have time to chase, turn them into decision-ready findings, and grow into more autonomy across your internship. You work within a clear direction, and you can take a fix all the way into production when it's a data-level change you can script, like manufacturer merges or backfills. You won't need to be a Rust engineer or own large refactors. AI tooling does the heavy lifting on unfamiliar code and scripting. Your judgment and rigor are what matter most. This role is an internship with a duration of three to six months. 🎯 Your performance objectives Turn ambiguous data-quality questions into customer-relevant findings by reframing part-level observations into business/customer impact (e.g. tenant-aware "what actually blocks costing"), defining a sensible metric or proxy, and producing a measured, caveated answer to the question set by the product manager. Independently size problems and test hypotheses against our data by writing read-only queries over the data warehouse (ClickHouse) and production Postgres, and producing numbers you can defend (knowing when a result is double-counted, misleading, or too good to be true) Make the effect of fixes and experiments visible by extending our dashboards and building ad-hoc visualizations that show trends, baselines, and whether an intervention actually moved coverage/correctness. Run small experiments to gather evidence by writing scripts (with AI assistance) against external sources such as SiliconExpert and DigiKey, e.g., to check whether a missing-data gap is fetchable, calibrate a finding, or do spot checks on interesting cases. Verify assumptions in the product itself by navigating the epibator (Rust/TS) codebase with AI tooling to confirm how data is actually resolved/used, and occasionally adding light instrumentation we find we need, without owning large refactors. Apply the fixes you've scoped, safely by writing AI-assisted scripts that correct production customer data at scale: e.g. automating the research to decide whether two manufacturers are the same record and then executing thousands of merges. Make every change safe by construction : dry-run and validate against samples first, work in reversible/checkpointed batches, and put guardrails in place so we never introduce regressions or corrupt manufacturing/costing data. Leave behind durable, trustworthy knowledge by following the mission's loop (brief, investigate, report, distill), citing evidence, dating facts, and writing findings other teams and stakeholders can act on without re-deriving them. Be your own harshest critic by reconciling and sanity-checking your own results, clearly separating "what's proven" from "what's still a hypothesis," and flagging loudly when a finding overturns a prior assumption (incl. your own). 💪 What you bring Discovery instinct and intellectual honesty are what set this role apart, sitting on top of an analytics baseline. Deep technical and domain depth is coachable and AI-assisted. Calibration and judgment are not. You can pull a trustworthy number out of messy data with SQL You own the question, not just the query, and reframe it toward what actually matters You sanity-check your own work and say "not proven yet" when that's the truth You communicate findings clearly enough for a non-analyst to act on You use AI with real verification, and write small Python scripts to fix data safely You bring high agency, learn fast, and don't drop threads Bonus: you read code with AI help, and you're curious about the electronics domain 💻 Our tech stack We always try to use the best tool for the job. Don't worry, we don't need you to be familiar with all of these: ClickHouse as our data warehouse and PostgreSQL for our production data, the two main places you'll query. Python for scripting, data transformation, and safe bulk fixes. Rust and TypeScript in our main product codebase, which you'll read (with AI help) to verify behavior. External data sour

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at luminovo? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect