Design, build, and operate production-grade, Python-based data pipelines that power ML workflows, including real-time and near-real-time data processing and aggregations.
Own end-to-end data quality and reliability by implementing validations, automated testing, monitoring, and alerting - with clearly defined SLAs and ownership boundaries.
Strengthen and document our data foundations and governance, covering architecture, data lineage, dataset definitions, and dependency management.
Partner closely with the Risk Platform, ML Data Platform, and cross-functional teams - including data scientists, software engineers, and analysts - to align on requirements and deliver changes safely to production.
Drive Feature Store governance, improve usability, and standardize our Feature Store setup.
You'll be great for this role if...
4+ years of experience as a Data Engineer, focusing on building and deploying scalable data solutions.
Strong proficiency in Python and PySpark, with solid software development engineering skills for designing computationally effective solutions in large-scale production environments.
Experience with feature stores and feature engineering pipelines (building and maintaining enriched attributes for online/offline use).
Experience building and maintaining data pipelines, with a good understanding of API design, containers (Docker), and version control (Git).
Experience with AWS technologies (S3, EKS, Keyspaces, Athena) or equivalent cloud provider.
Experience with streaming or event-driven architectures (e.g., Kafka) and familiarity with open table formats (e.g., Apache Iceberg, Delta Lake).
Requirements
Experience working in Fintech or Risk & Compliance domains
Experience with embeddings/representation learning pipelines
Why you should join SumUp
๐ Opportunity to work with SumUppers globally on large-scale fintech products used by millions of businesses worldwide, from our Berlin office. This involves an office-first setup
๐ Commitment to Diversity and Inclusion: Be part of a workplace that values and promotes diversity, fostering an inclusive environment where everyone's perspectives are respected and embraced
๐ Enrolment in our Virtual Stock Option programme: you will own a stake in SumUp's future success
๐ A dedicated annual L&D budget of โฌ2000 for your individual development, which can be used to attend conferences and/or advance your career through further education
๐ถ A corporate pension scheme where we match up to 20% of your contributions
๐ Generous time off: enjoy 28 days of paid leave plus public holidays and special leave days
๐ Numerous other benefits, such as Urban Sports Club subsidy, Kita placement assistance, and subsidised office lunches
๐ด Break4me: 1-month sabbatical after 3 years of service
๐ Referral Bonus: earn additional rewards by referring talented individuals to join the SumUp team
About SumUp
Be empowered to do more that matters.
At SumUp, we're on a mission to empower small businesses across the globe by providing simple and affordable tools that allow them to thrive. Today, over 4 million businesses in 37 markets rely on SumUp as their financial partner to manage payments, finance, and customer relationships.
SumUp is proud to be an Equal Employment Opportunity emp
Additional Information
Berlin, Germany | Full-time | Office-first
Team description
The Card Not Present (CNP) Protect is a cross-functional team within the Risk & Compliance tribe that's shaping SumUp's global approach to fraud prevention. Our team is responsible for fraud prevention in our card-not-present products. We build cutting-edge systems and ML solutions for automated decision-making in the CNP space.
The Senior Data Engineer role is critical for creating a robust, high-quality data foundation that drives efficiency and enables data-driven decisions for the Risk & Compliance teams. You will build and operate the production data layer that powers automated decisioning for CNP fraud detection. You will own feature jobs and work closely with the ML Platform team to build and scale the pipelines. Our roadmap includes building infrastructure to generate and publish foundation model embeddings.