The authors Mathias Wilhelm, Tobias Schmidt and Siegfried Gessulat.
The authors Mathias Wilhelm, Tobias Schmidt and Siegfried Gessulat.
Image: A. Eckert /TUM
  • Research news
  • Reading time: 1 MIN

Machine learning makes proteomics research more effectiveArtificial intelligence boosts proteome research

Using artificial intelligence, researchers at the Technical University of Munich (TUM) have succeeded in making the mass analysis of proteins from any organism significantly faster than before and almost error-free. This new approach is set to provoke a considerable change in the field of proteomics, as it can be applied in both basic and clinical research.

The genome of any organism contains the blueprints for thousands of proteins which control almost all the functions of life. Defective proteins lead to serious diseases, such as cancer, diabetes or dementia. Therefore, proteins are also the most important targets for drugs.

To better understand life processes and diseases and develop more appropriate therapies, it is necessary for as many proteins as possible to be analyzed simultaneously. At present, mass spectrometry is used in order to determine the type and quantity of proteins in a biological system. However, the current methods of data analysis continue to produce many mistakes.

A team at the Technical University of Munich led by bioinformatics scientist Mathias Wilhelm and biochemist Bernhard Küster, Professor of Proteomics and Bioanalytics at the Technical University of Munich, has now succeeded in using proteomic data to train a neural network in such a way that it is able to recognize proteins much more quickly and with almost no errors.

A solution to a serious problem

Mass spectrometers do not measure proteins directly. They analyze smaller parts consisting of amino acid sequences with up to 30 building blocks. The measured spectra of these chains are compared with databases in order to assign them to a specific protein. However, the evaluation software can only use part of the information that the spectra contain. Therefore, certain proteins are not recognized or are recognized incorrectly.

"This is a serious problem," explains Küster. The neural network developed by the TUM team uses all the information of the spectra for the process of identification. "We miss fewer proteins and make 100 times fewer mistakes," says Bernhard Küster.

Applicable to all organisms

"Prosit", as the researchers call the AI software, is "applicable to all organisms in the world, even if their proteomes have never been examined before," explains Mathias Wilhelm. "This enables research which was previously inconceivable."

With the help of 100 million mass spectra, the algorithm has been so extensively trained that it can be used for all common mass spectrometers without any additional training. "Our system is the global leader in this field," says Küster.

A market worth billions

Clinics, biotech companies, pharmaceutical companies and research institutes are using high-performance devices of this kind; the market is already worth billions. With "Prosit", it will be possible to develop even more powerful instruments in the future. Researchers and physicians will also be better and faster able to search for biomarkers in patients' blood or urine, or monitor therapies for their effectiveness.

The researchers also have high hopes for fundamental research. "The method can be used to track down new regulatory mechanisms in cells," says Küster. "We hope to gain a considerable amount of knowledge here, which, in the medium and long term, will be reflected in the treatment of diseases suffered by humans, animals and plants."

Wilhelm also expects that "AI methods such as Prosit will soon change the field of proteomics , as they can be used in almost every area of protein research"


Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning
Siegfried Gessulat, Tobias Schmidt, Daniel Paul Zolg, Patroklos Samaras, Karsten Schnatbaum, Johannes Zerweck, Tobias Knaute, Julia Rechenberger, Bernard Delanghe, Andreas Huhmer, Ulf Reimer, Hans-Christian Ehrlich, Stephan Aiche, Bernhard Küster und Mathias Wilhelm
Nature Methods, 27.05.2019 – DOI: 10.1038/s41592-019-0426-7

More information:

The study was carried out in cooperation with the companies JPT (Berlin), SAP (Potsdam) and ThermoFisher Scientific (Bremen). The project is funded by the German Federal Ministry of Education and Research (BMBF) as part of the ProteomeTools project. Prosit is available via ProteomicsDB, which is funded by the BMBF in the scope of the DIAS project.

Corporate Communications Center

Technical University of Munich Dr. Andreas Battenberg

Contacts to this article:

Dr. Mathias Wilhelm und Prof. Dr. Bernhard Küster
Professorship for Proteomics and Bioanalytics
Technical University of Munich
Emil Erlenmeyer Forum 5, 85354 Freising, Germany
Tel.: +49 8161 71 5696

Article at

Detail of the brain vasculature of a mouse.

Analysis of whole brain vasculature

Diseases of the brain are often associated with typical vascular changes. Now, scientists at LMU University Hospital Munich, Helmholtz Zentrum München and at the Technical University of Munich (TUM) have come up with a...

Researchers entering data on a computer.

Major project for Munich neurosciences

In a joint large-scale project, Munich scientists from proteomics, computer science and medicine investigate the causes of disorders of the central nervous system, how they can be diagnosed and how treatment response can be...

Artificial intelligence helps physicists find the optimal description of quantum phenomena.

Which one is the perfect quantum theory?

For some phenomena in quantum many-body physics several competing theories exist. But which of them describes a quantum phenomenon best? A team of researchers from the Technical University of Munich (TUM) and Harvard...

Genome researchers frequently use machine learning models.

Intelligent algorithms for genome research

In order to find out which genes are responsible for diseases such as cancer or diabetes, scientists nowadays frequently resort to using machine-learning models. In order to give clinical researchers access to the latest...