High-performance computer Super MUC at Leibniz Supercomputing Center (LRZ) in Garching.
High-performance computer Super MUC at Leibniz Supercomputing Center (LRZ) in Garching. Such computers have become important tools for cutting-edge research. TUM, LRZ, and other partners are part of an initiative aiming to create a national reserarch data infrastructure.
Image: Andreas Heddergott / TUM
  • Research news
  • Reading time: 3 MIN

National research data infrastructure: TUM involved in three consortiaMore effective sharing of research data

The German federal government and the states want to make research data more readily accessible. The goal: to build a national research data infrastructure. Nine consortia have now been chosen to develop such structures for various research areas. The Technical University of Munich (TUM) will be contributing its expertise in data science through its involvement in three of the consortia. These are concerned with research in the engineering sciences, catalysis and genomics.


Genome sequencing produces immense quantities of data. The aim of the German Human Genome-Phenome Archive (GHGA) is to make these data available to science without violating the personality rights of patients. The GHGA will focus initially on data collections pertaining to cancer and rare genetic disorders. Several nodes will be created across Germany through which researchers can access datasets. The sub-archive established under TUM's leadership at the Leibniz Supercomputing Center (LRZ) could provide access to pseudonymized datasets of Bavarian researchers on rare genetic disorders. The project is headed at TUM by Julien Gagneur, Professor of Computational Molecular Medicine, Juliane Winkelmann, Professor of Neurogenetics, and Thomas Meitinger, Professor of Human Genetics. In their part of the project, the scientists will address the challenges of databases on rare genetic disorders, among other topics. A special technical challenge with data involving these illnesses is how to make the data records comparable. The researchers will also develop the interface with which users can generate data output.


A uniform format for making experimental research data on catalytic processes accessible is still lacking. The consortium NFDI4Cat aims to create such a standard. The members will develop an information structure to enable data from heterogeneous and homogeneous catalysis – fields using highly diverse working methods – but also photo, bio- and electrocatalysis, to be captured in a uniform format that supports interoperability. The data records need to be as detailed as possible to avoid information losses, but cannot be so large that data structures can no longer be analyzed. The goal of this project is not only to make experimental processes understandable, but also to enable predictions on catalytic processes from data records from widely differing fields via machine learning. Johannes Lercher, who holds the Chair for Chemical Technology II at TUM, is working with colleagues on the subproject "Heterogeneous Catalysis". With his team he is developing proposals on such issues as which kinetics data for the materials involved must be captured for this field of study. In cooperation with the other areas, the project will attempt to find the most suitable uniform format and the best solutions for making these data accessible to researchers across Germany.


Large quantities of research data are produced in the engineering sciences as well. The vast range of topics results in wide diversity in data formats. The goal of the NFDI4ING consortium is to establish effective data management to facilitate more extensive use of the data, including for machine learning applications. The results of research in engineering will also be made more transparent and understandable. For this purpose, the team began by identifying various data archetypes, i.e. classes of research data sharing certain characteristics and specific processing challenges. Christian Stemmer, a professor at the TUM Chair of Aerodynamics and Fluid Mechanics, is a member of the steering committee and the spokesman of the subsection "High-performance Measurement and Computation". This archetype is concerned with very large datasets. These are either generated on supercomputers or measured in high-resolution experiments and can only be processed in large computing facilities. In the coming years Prof. Stemmer and his team want to study how researchers can gain access to these often very large datasets, among other challenges. One goal is to create tools that can retrieve results without bringing the data centers where they are stored to a standstill. Another working area for the scientists is defining understandable metadata as a basis for finding and reusing data records.


Corporate Communications Center

Technical University of Munich Paul Hellmich

Article at tum.de

Genome researchers frequently use machine learning models.

Intelligent algorithms for genome research

In order to find out which genes are responsible for diseases such as cancer or diabetes, scientists nowadays frequently resort to using machine-learning models. In order to give clinical researchers access to the latest...

Patienten mit Restless Legs Syndrom verspüren nachts einen starken Bewegungsdrang und leiden an unangenehmen Empfindungen wie Schmerzen oder Kribbeln in den Beinen. (Bild: burakkarademir / iStock)

Restless legs syndrome: New genetic risk variants found

Restless legs syndrome (RLS) is characterized by restless, painful legs that do not settle down at night. The causes are largely unknown. An international team led by the Technical University of Munich (TUM) and the...

Mitarbeiter von Prof. Lerchers Team im Labor des Katalyseforschungszentrums. Dr. Yue Liu, Teresa Schachtl und Daniel Melzer  (vlnr; Bild: Andreas Heddergott / TUM)

Bio-fuel from waste

Fuel from waste? It is possible. But hitherto, converting organic waste to fuel has not been economically viable. Excessively high temperatures and too much energy are required. Using a novel catalyst concept, researchers...

Visualisierung der Basenpaare des menschlichen Erbguts mit den Buchstaben G, A, T, und C.

New route to a diagnosis

In about half of all patients with rare hereditary disorders, it is still unclear what exact position of the genome is responsible for their condition. One reason for this is the enormous quantity of information encoded in...