Prof. Heckel, Biohackers is about a medical student seeking revenge on a professor with a dark past – and the manipulation of DNA with biotechnology tools. You were commissioned to store the series on DNA. How does that work?
First, I should mention that what we're talking about is artificially generated – in other words, synthetic – DNA. DNA consists of four building blocks: the nucleotides adenine (A), thymine (T), guanine (G) and cytosine (C). Computer data, meanwhile, are coded as zeros and ones. The first episode of Biohackers consists of a sequence of around 600 million zeros and ones. To code the sequence 01 01 11 00 in DNA, for example, we decide which number combinations will correspond to which letters. For example: 00 is A, 01 is C, 10 is G and 11 is T. Our example then produces the DNA sequence CCTA. Using this principle of DNA data storage, we have stored the first episode of the series on DNA.
And to view the series – is it just a matter of "reverse translation" of the letters?
In a very simplified sense, you can visualize it like that. When writing, storing and reading the DNA, however, errors occur. If these errors are not corrected, the data stored on the DNA will be lost. To solve the problem, I have developed an algorithm based on channel coding. This method involves correcting errors that take place during information transfers. The underlying idea is to add redundancy to the data. Think of language: When we read or hear a word with missing or incorrect letters, the computing power of our brain is still capable of understanding the word. The algorithm follows the same principle: It encodes the data with sufficient redundancy to ensure that even highly inaccurate data can be restored later.
Channel coding is used in many fields, including in telecommunications. What challenges did you face when developing your solution?
The first challenge was to create an algorithm specifically geared to the errors that occur in DNA. The second one was to make the algorithm so efficient that the largest possible quantities of data can be stored on the smallest possible quantity of DNA, so that only the absolutely necessary amount of redundancy is added. We demonstrated that our algorithm is optimized in that sense.
DNA data storage is very expensive because of the complexity of DNA production as well as the reading process. What makes DNA an attractive storage medium despite these challenges?
First, DNA has a very high information density. This permits the storage of enormous data volumes in a minimal space. In the case of the TV series, we stored "only" 100 megabytes on a picogram – or a billionth of a gram of DNA. Theoretically, however, it would be possible to store up to 200 exabytes on one gram of DNA. And DNA lasts a long time. By comparison: If you never turned on your PC or wrote data to the hard disk it contains, the data would disappear after a couple of years. By contrast, DNA can remain stable for many thousands of years if it is packed right.
And the method you have developed also makes the DNA strands durable – practically indestructible.
My colleague Robert Grass was the first to develop a process for the "stable packing" of DNA strands by encapsulating them in nanometer-scale spheres made of silica glass. This ensures that the DNA is protected against mechanical influences. In a joint paper in 2015, we presented the first robust DNA data storage concept with our algorithm and the encapsulation process developed by Prof. Grass. Since then we have continuously improved our method. In our most recent publication in Nature Protocols of January 2020, we passed on what we have learned.
What are your next steps? Does data storage on DNA have a future?
We're working on a way to make DNA data storage cheaper and faster. "Biohackers" was a milestone en route to commercialization. But we still have a long way to go. If this technology proves successful, big things will be possible. Entire libraries, all movies, photos, music and knowledge of every kind – provided it can be represented in the form of data – could be stored on DNA and would thus be available to humanity for eternity.
Linda C. Meiser, Philipp L. Antkowiak, Julian Koch, Weide D.Chen, A.Xavier Kohll, Wendelin J. Stark, Reinhard Heckel, Robert N. Grass: "Reading and writing digital data in DNA". Published in Nature Protocols in January 2020. DOI: 10.1038/s41596-019-0244-5
BBC video about Prof. Heckel and Prof. Grass: "This is how to store human knowledge for eternity".
Contacts to this article:
Prof. Dr. Reinhard Heckel
Technical University of Munich
Professor of Machine Learning
Tel: +49 89 289 28527