Genome researchers frequently use machine learning models.
Genome researchers frequently use machine learning models.
Image: nicolas_
  • Artificial Intelligence, Research news
  • Reading time: 3 MIN

New repository "Kipoi" improves access to machine learning modelsIntelligent algorithms for genome research

In order to find out which genes are responsible for diseases such as cancer or diabetes, scientists nowadays frequently resort to using machine-learning models. In order to give clinical researchers access to the latest algorithms, Prof. Julien Gagneur from the Technical University of Munich (TUM) has set up a new repository called "Kipoi" in collaboration with scientists from other universities and research institutes.

Although the importance of machine learning methods in genome research has grown steadily in recent years, researchers have often had to resort to using obsolete software. Scientists in clinical research often did not have access to the most recent models. This will change with the new free open access repository: Kipoi enables an easy exchange of machine learning models in the field of genome research. The repository was created by Julien Gagneur, Assistant Professor of Computational Biology at the TUM, in collaboration with researchers from the University of Cambridge, Stanford University, the European Bioinformatics Institute (EMBL-EBI) and the European Molecular Biology Laboratory (EMBL).

„Kipoi provides very exciting opportunities to understand individual genomes.”— Julien Gagneur, Assistant Professor of Computational Biology

Trained models freely available

"What makes Kipoi special is that it provides free access to machine learning models that have already been trained," says Julien Gagneur. "What we are doing with Kipoi is not just sharing data and software, but sharing models and algorithms that are already trained on the most relevant data. These models are ready to use, because all the cumbersome work of applying them to data has already been done," says Anshul Kundaje, Assistant Professor at Stanford. More than 2,000 trained models are currently freely accessible on Kipoi. In a recent study published in Nature Biotechnology, the researchers show that the new repository will accelerate exchange in the genomics community and thereby advance genome research.

Fast algorithms and easy operation

Because Kipoi simplifies access to already trained models, researchers can perform transfer learning. This means that a model that has already been trained with a particular dataset is capable of learning a similar task faster. Kipoi also simplifies the process of feeding data into the models stored there: Standardized file formats and software frameworks reduce the installation and execution of a model to three simple commands. Those who previously had no experience in machine learning can thus also easily use the repository.

Understanding individual genomes

As Kipoi is oriented towards models that link genotype and phenotype, the new platform will make it easier to identify genetic causes of disease: "Kipoi puts the latest deep learning models trained on massive genomics data at the fingertips of clinical researchers," says Julien Gagneur. "This provides very exciting opportunities to understand individual genomes, for instance to pinpoint genetic variants causing diseases or to interpret mutations occurring in tumors.”

However, the extent of the platform's contribution to genomic research will also depend on the genomics community. "We hope that in the future more researchers will bring their models to our repository," says Oliver Stegle, team leader at the EMBL-EBI. "That is the only way we can make genomics analysis accessible and ultimalely make a wider range of predictive machine learning tools available to the genomics community."


Avsec, Z.; Gagneur, J.; Kundaje, A.; Stegle, O. et al. (2019). The Kipoi repository accelerates community exchange and reuse of predictive models for genomics. Nature Biotechnology. Published online 28/05/19; DOI: 10.1038/s41587-019-0140-0.

Technical University of Munich

Corporate Communications Center Lisa Pietrzyk

Contacts to this article:

Prof. Dr. Julien Gagneur
Technichal University of Munich
Professor for Computational Biology
Phone: +49 (89) 289 - 19411

Article at

High-performance computer Super MUC at Leibniz Supercomputing Center (LRZ) in Garching.

More effective sharing of research data

The German federal government and the states want to make research data more readily accessible. The goal: to build a national research data infrastructure. Nine consortia have now been chosen to develop such structures for...

The authors Mathias Wilhelm, Tobias Schmidt and Siegfried Gessulat.

Artificial intelligence boosts proteome research

Using artificial intelligence, researchers at the Technical University of Munich (TUM) have succeeded in making the mass analysis of proteins from any organism significantly faster than before and almost error-free. This...

A neuron can cause a domino effect

If the sense of smell disappears, this can indicate a disease such as Alzheimer's or Parkinson's disease. However, unlike previously assumed, general degenerations in the nervous system do not play a leading role in the...

Visualisierung der Basenpaare des menschlichen Erbguts mit den Buchstaben G, A, T, und C.

New route to a diagnosis

In about half of all patients with rare hereditary disorders, it is still unclear what exact position of the genome is responsible for their condition. One reason for this is the enormous quantity of information encoded in...

Zwischen den Genen lesen.

Reading between the genes

For a long time dismissed as "junk DNA", we now know that also the regions between the genes fulfil vital functions. Mutations in those DNA regions can severely impair development in humans and may lead to serious diseases...