Data Engineer
ExternalPrepare for this interview
EliteAI-generated questions, company research, and talking points tailored to this role
About the role
The Velankar team maintains macromolecular structure databases that form essential resources for biologists and other life scientists worldwide. PDBe is a founding partner of the Worldwide Protein Data Bank organisation, which maintains the global archive of 3D structural data on macromolecules the Protein Data Bank (PDB). The PDBe team also develops the PDBe Knowledge Base (PDBe-KB) and AlphaFold Protein Structure Database (AFDB). The PDBe team is international and inter-disciplinary and consists of expert data curators, bioinformaticians, scientific software developers and IT specialists. Your role We seek a skilled and motivated Data Engineer to join our dynamic team. As a Data Engineer, you will play a crucial role in optimising and enhancing our data pipelines, ensuring efficient data processing, storage and retrieval. You will work closely with cross-functional teams to analyse requirements, propose new data pipeline architectures, and implement solutions to improve performance and scalability. The tasks for this post include the following: Analyse existing data pipelines and identify areas for improvement, optimisation, and scalability. Work closely with Bioinformaticians and annotators to integrate data pipelines with existing systems and applications. Monitor data pipeline performance, troubleshoot issues, and implement solutions to ensure reliability and efficiency. Stay current with industry trends and best practices in data engineering and recommend new technologies or tools to enhance data infrastructure. Document data pipelines, processes, and workflows for internal reference and knowledge sharing. Join us in shaping the future of structural biology data. In this role, you'll use your IT skills and creative ideas to support and scale vital resources like the PDB, PDBe, PDBe-KB and AFDB-ensuring they remain robust, sustainable, and ready for tomorrow's scientific challenges. You have MSc in computer science, IT or a related field, or in bioinformatics with a demonstrated IT expertise Expert in Data Modelling and Advanced SQL Proficiency in Python programming Proficiency in ETL (Extract, Transform, Load) processes and tools for large-scale data processing. Strong understanding of relational databases Strong understanding of relational databases with hands-on experience across multiple RDBMS platforms: PostgreSQL: Deep knowledge of PostgreSQL database architecture, performance tuning, partitioning strategies, indexing techniques, and query optimisation Oracle: Extensive experience with Oracle databases, including PL/SQL, Oracle-specific features, and performance optimisation MySQL/MariaDB: Familiarity with alternative RDBMS platforms for data migration and compatibility scenarios Experience with database migration Proven experience in migrating databases between different RDBMS platforms, specifically: Oracle to PostgreSQL migration: Hands-on experience with Oracle to PostgreSQL migration projects, including understanding of compatibility layer (pg_proguard), data type mapping, stored procedure conversion, trigger migration, and handling Oracle-specific features in PostgreSQL Data migration best practices: Experience with migration tools such as Oracle Data Pump, GoldenGate, custom ETL scripts, and data validation strategies Migration planning: Ability to plan and execute migration projects, including downtime management, data consistency verification, and rollback strategies Cross-platform optimisation: Knowledge of leveraging PostgreSQL features to improve performance during migration scenarios Proficiency in data warehousing (Redshift, BigQuery) Strong communication and collaboration skills, with the ability to work effectively in a team environment. Proficiency in oral and written English You might also have PhD in computer science, IT or a related field, or in bioinformatics with a demonstrated IT expertise Experience in big data technologies and frameworks, such as Apache Spark, Hadoop or similar platforms Hands-on experience with CI/CD (GitLab CI/GitHub Actions) Familiarity with Java Familiarity with Google Cloud Platform or AWS Familiarity with data modelling techniques for AI (Artificial Intelligence) and ML (Machine Learning) applications Familiarity with Neo4J or other graph databases is an added advantage Familiarity with data visualisation (Tableau, PowerBI) Knowledge of, or affinity with, structural biology and bioinformatics Experience working in international teams Other helpful information Hybrid Working: At EMBL-EBI we are pleased to offer hybrid working options for all our employees. A dedicated desk will be available everyday, but our team work two days on site and three from home. Interviews : We plan to hold first round technical introductory meetings with selected candidates remotely starting Early July 2026. Contract length: Grant based contract for 3 years Salary : Grade 5 monthly salary starting at £3, 303 per month a