Bioinformatician

External

Embl · Hinxton, Cambridgeshire

ContractHybridToday

CI/CDDocumentationElixirGitLinuxMachine Learning

Cover Letter Connect

Prepare for this interview

Elite

AI-generated questions, company research, and talking points tailored to this role

About the role

We are seeking a Bioinformatician with expertise in data integration and experience in structural biology and molecular dynamics (MD) data to join the Velankar team at the European Bioinformatics Institute (EMBL-EBI). The Protein Data Bank in Europe (PDBe) team develops essential macromolecular structure resources and tools for biologists and other life scientists. As a founding partner of the Worldwide Protein Data Bank, we are responsible for maintaining the global archive of experimentally determined macromolecular structures, the Protein Data Bank (PDB). We also manage the community-led PDBe Knowledge Base (PDBe-KB) resource and the AlphaFold Protein Structure Database (AFDB), a collaboration with Google DeepMind. The PDBe team consists of an international and interdisciplinary group of scientists, software engineers, and data engineers who develop a range of tools and services that support structure deposition, data integration, and advanced search capabilities for structural biologists and the wider life sciences community. This position offers an exciting opportunity to contribute to the Horizon Europe-funded MD4SB (Molecular Dynamics for Structure-Based Biology) project. MD4SB is a major European research infrastructure initiative aiming to transform structural biology by integrating molecular dynamics simulation data into the wider life sciences ecosystem. The project brings together ELIXIR, Instruct-ERIC, EU-OPENSCREEN, HPC centres, AI factories, and pharmaceutical industry partners to develop FAIR, AI-ready infrastructure for structural ensemble data and molecular simulations. Your role You will contribute to the development of infrastructure connecting molecular dynamics simulations with structural biology resources and biological knowledge bases. A major component of the role will be developing AI-driven approaches to mine scientific literature and automatically extract experimental and biological metadata to enrich MD datasets. You will develop and extend SIFTS (Structure Integration with Function, Taxonomy and Sequence), a core PDBe resource that provides residue-level mappings between PDB structures, UniProtKB sequences, and other biological resources, to facilitate the integration of MD-derived insights across the wider life sciences data ecosystem. This is an interdisciplinary role combining structural bioinformatics, molecular dynamics, and scientific software development. You will apply both scientific understanding and technical expertise to develop data integration workflows, APIs, and biological annotations that improve interoperability and reuse of structural and molecular simulation data across various resources. Primary responsibilities: Design and implement data integration pipelines that connect MDDB with major life science resources, including PDBe, UniProt, PDBe-KB, and other relevant knowledge bases and databases Develop and deploy AI- and machine learning-based approaches for extracting experimental and biological metadata from scientific literature to enrich MDDB datasets and support downstream biological interpretation. Extending and maintaining the SIFTS infrastructure and codebase to support integration of molecular dynamics and other data resources Develop and maintain software tools, APIs, workflows, and documentation that facilitate FAIR data integration, metadata enrichment, and integrated data access Collaborating with domain experts, software engineers, and data resource providers to enable the integration of MD-derived biological insights into the wider life sciences data ecosystem. Supporting FAIRification, standardisation, and interoperability of MD datasets and associated annotations Collaborating with international partners across ELIXIR, Instruct-ERIC, EU-OPENSCREEN, HPC centres, and industry Participating in community standards development, technical documentation, training, outreach, and dissemination activities You have PhD in Bioinformatics, Computational Biology, Structural Biology, Computer Science, Data Science, or a related field Familiarity with structural biology and molecular simulation data Experience with NLP/LLM-based scientific literature mining Demonstrated experience with FAIR data principles, metadata standards, and scientific repositories. Understanding of sequence, structure, and functional annotations of proteins Experience in scientific software development, preferably in Python Experience with Linux environments, Git, and CI/CD practices Scientific publications relevant to structural biology, bioinformatics, or protein annotations Strong communication, collaboration, and problem-solving skills You may also have Postdoctoral research experience in a relevant field Experience with graph databases (e.g. Neo4J), REST APIs, containerisation technologies, and workflow management systems such as Nextflow Experience in data visualisation and analysis Understanding of FAIR data principles and the biological

Your Match

How well this role fits your profile.

Company Intel

What employees say

Worked at embl? Share your experience

Interested in this role?

Apply on the company's website.

Cover Letter Connect