NLP Data Scientist
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
Requirements
- Degree in computer science, computational linguistics, artificial intelligence, mathematics, physics, engineering, or related field
- Familiarity with NLP and machine learning proven through personal, university or work projects
- Experience building or adapting machine learning models end-to-end, with the ability to explain the decisions made
- Proficiency in Python (or similarly complex programming language), which is the primary programming language used
- Willingness to learn about unfamiliar languages and scripts
- Master's degree in relevant field
- Familiarity with the scientific and ML-related Python stack: NumPy, SciPy, scikit-learn, TensorFlow, PyTorch, PySpark, Hugging Face Transformers, spaCy
- Familiarity working with LLMs, LLM APIs, agentic systems
- Experience in working with multiple languages and scripts
- Experience with integrating the results of research into user-facing products
- About Quantexa
- We're made up of people from 47 nationalities who speak over 20 languages, and nearly half of our colleagues come from an ethnic or religious minority background.
- If this sounds like a place you'd want to work, we'd love to hear from you!
- Our perks and quirks.
- What makes you Q will help you to realize your full potential, flourish and enjoy what you do, while being recognized and rewarded with our broad range of benefits.
Benefits
Additional Information
We're hiring an NLP Data Scientist to work on challenging NLP problems that underpin Quantexa's entity resolution product: messy and multilingual text, creating training data from scratch, and careful evaluation. The role sits in our NLP Centre of Excellence, which builds NLP capabilities for Quantexa's products - an increasingly important area as the platform is applied to unstructured text. Our work ranges from shipping production models for tasks like named entity recognition, relation extraction, and text classification, through to more exploratory research that drives improvements in our products and the wider field. We work on the full spectrum of model complexity - from fast, explainable models, to fine-tuned small language models to agentic LLM-based systems. As a NLP Data Scientist, you'll develop these NLP components end-to-end from framing the problem and building training and testing datasets, training and evaluating models, handover for production, and improving accuracy based on feedback. Near-term, this is focused on building lightweight, fast NLP models where careful design of training data and evaluation benchmarks is critical. You'll be involved in: Translating underspecified business problems into well-scoped ML tasks Training, evaluating, improving NLP models end-to-end through reproducible pipelines Sourcing, building, curating training and evaluation datasets from heterogeneous, often imperfect inputs Working across languages and scripts, such as Arabic, Japanese, Chinese, and others Developing and using LLM-assisted workflows for dataset generation, error analysis, and model improvement iterations Collaborating with engineering and product teams to turn ideas into integrated product features; presenting findings to stakeholders
Your Match
How well this role fits your profile.
Company Intel
What employees say
Worked at Quantexa? Share your experience