TUM – Technical University of Munich Menu
The Sequence Read Archive, a public database for deposition of sequences, currently stores over 100,000 gene sequence datasets which previously could not be evaluated in their whole. (Photo: Fotolia/ Dreaming Andy)
The Sequence Read Archive, a public database for deposition of sequences, currently stores over 100,000 gene sequence datasets which previously could not be evaluated in their whole. (Photo: Fotolia/ Dreaming Andy)
  • Research news

New bioinformatics tool for searching sequencing dataBig data processing enables worldwide bacterial analysis

Sequencing data from biological samples such as the skin, intestinal tissues, or soil and water are usually archived in public databases. This allows researchers from all over the globe to access them. However, this has led to the creation of extremely large quantities of data. To be able to explore all these data, new evaluation methods are necessary. Scientists at the Technical University of Munich (TUM) have developed a bioinformatics tool which allows to search all bacterial sequences in databases in just a few mouse clicks and find similarities or check whether a particular sequence exists.

Microbial communities are essential components of ecosystems around the world. They play a key role in key biological functions, ranging from carbon to nitrogen cycles in the environment to the regulation of immune and metabolic processes in animals and humans. That is why many scientists are currently investigatin microbial communities in great detail.

Sequencing for microbiological DNA analysis

The Sanger sequencing method developed in 1975 used to be the gold standard to decipher the DNA code for 30 years. Recently, next generation sequencing technologies, or NGS as they are called, have led to a new revolution: With minimal personnel requirements, current devices can, within 24 hours, generate as much data as a hundred runs of the very first DNA sequencing method.

Today, the sequencing analysis of bacterial 16S rRNA genes is the most frequently used identification method for bacteria. The 16S rRNA genes are seen as ideal molecular markers for reconstructing the degree of relationship between organisms, as their sequence of nucleotides (the building blocks of DNA) has been relatively conserved throughout evolution and can be used to infer phylogenetic relationships between microorganisms. The acronym rRNA stands for ribosomal ribonucleic acid.

The Sequence Read Archive (SRA), a public database for deposition of sequences, currently stores over 100,000 such 16S rRNA gene sequence datasets. This is because the new technical procedures for DNA sequencing have caused the volume and complexity of genome research data over the past few years to grow exponentially. The SRA is home to datasets which previously could not be evaluated in their whole.

"Over all these years, a tremendous amount of sequences from human environments such as the intestine or skin, but also from soils or the ocean has been accumulated", explains Dr. Thomas Clavel from the Institute for Food and Health (ZIEL) at the TU Munich. "We have now created a tool which allows these databases to be searched in a relatively short amount of time in order to study the diversity and habitats of bacteria", says Clavel — "with this tool, a scientists can conduct a query within a few hours in order to find out in which type of samples the bacterium he is interested in can be found — for example a pathogen from a hospital. This was not possible before." The new platform is called Integrated Microbial Next Generation Sequencing (IMNGS) and can be accessed via the main website www.imngs.org.

A detailed description of how IMGS functions using the intestinal bacterium Acetatifactor muris has been published in the current online issue of "Scientific Reports". Registered users can carry out queries filtered by the origin of the bacterial data, or also download entire sequences.

Such bioinformatics approach may soon become indispensable in routine daily clinical diagnostics. However, one critical aspect is that many members of complex microbial communities remain to be described. "Improving the quality of sequence datasets by collecting new reference sequences is a great challenge ahead", says Clavel — "moreover, the quality of datasets is not yet good enough: the description of individual samples in databases is incomplete, and hence the comparison possibilities using IMNGS are currently still limited."

However, Clavel imagines that a collaboration with clinics could be a catalyst for progress, provided the database is filled more meticulously. "If we had very well-maintained databases, we could use innovative tools such as IMNGS to possibly help diagnosis of chronic illnesses more rapidly", says Clavel.


Ilias Lagkouvardos, Divya Joseph, Martin Kapfhammer, Sabahattin Giritli, Matthias Horn, Dirk Haller and Thomas Clavel: IMNGS: A comprehensive open resource of processed 16S rRNA microbial profiles for ecology and diversity studies, Scientific Reports 2016. DOI: 10.1038/srep33721.


Dr. habil. Thomas Clavel
Technical University of Munich
ZIEL – Institute for Food and Health
Core Facility NGS/Microbiome
Phone: +49 81 61 71 55 34
Mail: thomas.clavel(at)tum.de

Corporate Communications Center

Technical University of Munich

Article at tum.de

In der Munich School of Data Science finden Doktorandinnen und Doktoranden ein auf sie zugeschnittenes Kursangebot. (Foto: iStockphoto/gorodenkoff)

New Graduate School for Data Science

Digitized research produces enormous amounts of data these days. This increasingly complex flood of data contains great potential, for example for biomedicine. However, big data needs to be controlled and interpreted in...

Die Abbildung wird Kladogramm genannt und zeigt erstmals 76 Darmbakterienstämme im Mikrobiom von Mäusen. (Quelle: Nature Microbiology)

Mouse gut bacteria find a new home

Mouse models are extensively used in pharmaceutical and medical research, and it is known that the communities of microbes in their intestine can have a significant impact on the research output. However, there is still...

Viertelmillimeter große Organoide haben essentielle Funktionen eines echten Darms. (Foto: TUM/ Zietek)

Mini-intestine grown in a test tube

The ability to grow three-dimensional precursors of an organ from stem cells in a Petri dish has brought about a revolution in the field of biomedicine. But exactly what can be researched on such an organoid in vitro? A...

Die Bilder zeigen Paneth-Zellen im Dünndarm, die bei der Immunabwehr eine wichtige Rolle spielen. Bei Mäusen mit Morbus-Crohn-ähnlicher Entzündung produzieren die Paneth-Zellen weniger Lysozym - eine Substanz, die wichtig für die Mikroben-Abwehr ist. Links: gesunde Zellen mit hoher Lysozym-Produktion (helles grün), rechts geschädigte Paneth-Zellen mit geringer Lysozym-Produktion.

Novel mechanism for Crohn’s disease uncovered

Crohn’s disease is one of a family of chronic inflammatory bowel diseases (IBD). While it has already been proven to have genetic causes, scientists have now shown that the presence of certain intestinal bacteria also plays...

Schnitt durch einen Wundbereich im Dickdarm. Dargestellt ist das Darmepithel einer CHOP-Maus mit einer Region, in der sich die Wundheilung verzögert.

Cell stress inflames the gut

Inflammatory bowel disease (IBD) is a common condition in western industrialized countries. What triggers it, however, is not yet fully understood. Nutrition researchers at Technische Universität München (TUM) have now...