By Eduardo Longoria
On November 12th, California-based Twist Biosciences, Illumina, Microsoft, and Western Digital announced the launching of a joint effort with 11 other companies to advance the field of DNA data storage. The alliance aims to create an industry roadmap to establish the foundations for a commercial file storage ecosystem to accommodate the explosive growth of digital data and mitigate the negative effects of that demand.
Demands of Data Storage
It is estimated that by 2024, 30% of digital businesses will require DNA storage trials as a means of addressing the amount of data relative to available storage space. Since the year 2010 (according to Statista), the global demand for storage has outstripped the supply by an increasing amount until the current year, with 18,100 exabytes of excess data in need of storage. Not only does this short supply of storage space drive up the costs, but it comes with economic and environmental constraints as well.
An estimated 2% of global greenhouse gas emissions can be attributed to data centers and their high power demand. This same power demand leads to higher electricity costs in places where they are concentrated.
DNA as a Medium for Data Storage
These aforementioned problems can be mitigated by increasing the amount of information stored on a medium that reduces the need for space and power required by current data centers. To increase this storage capacity, DNA and its properties are being considered.
“DNA is an incredible molecule that, by its very nature, provides ultra-high-density storage for thousands of years,” said Emily M. Leproust, Ph.D., CEO, and Co-Founder of Twist Bioscience.
The human genome is approximately 3 billion base pairs with a single base pair translating into 2 bits as a 00, 10, 01, or 11, meaning that the whole human genome fits within the cell’s nucleus and contains 6 billion bits (around 700 megabytes). However, this size estimate only comes from the DNA itself and does nothing to cover the processes of splicing of introns, which can drastically increase this number by introducing new combinations. Irrespective, the entire sequence of DNA within a human cell is estimated to unfold to a meter in length if it were removed from the nucleus and straightened completely.
Karin Strauss, Ph.D., Senior Principal Research Manager at Microsoft, said, “In collaboration with University of Washington, we have demonstrated a fully automated end-to-end system capable of storing and retrieving data from DNA, and we have separately stored 1GB of data in DNA synthesized by Twist and recovered data from it.”
Twist Bioscience, Illumina, Western Digital, and Microsoft are joining the Alliance as founding members. In addition to an industry roadmap, the DNA Data Storage Alliance is developing use cases in various markets and industries as well as promote and educate the larger storage community to promote the adoption of this future solution.
The following organizations have joined the alliance as members:
- Ansa Biotechnologies
- The Claude Nobs Foundation (Montreux Jazz Digital Project)
- DNA Script
- EPFL (École Polytechnique Fédérale de Lausanne) – Cultural Heritage & Innovation Center (Montreux Jazz Digital Project)
- ETH Zurich – The Swiss Federal Institute of Technology in Zurich, Switzerland
- Molecular Assemblies
- Molecular Information Systems Lab at the University of Washington
How to Store Digital Data in DNA?
After a file is converted from binary code to A’s, C’s, T’s, and G’s, the DNA data file is then synthesized (“written”) in short segments of DNA (200 to 300 bases long) and stored. As well as storing part of the data file, each short segment contains an index for its place in the data file. To retrieve the data, the segments are sequenced (“read”) and then decoded back into the original file. This system allows part of the file to be biologically recovered in a similar way to random access memory before sequencing, so only data of interest is sequenced. It additionally has error-correcting algorithms to ensure error-free data coding and decoding.
Advantages over Traditional Storage
DNA data storage has the potential to deliver a true low-cost archival data storage solution. While current storage has a limited lifespan and requires data migration for long-term data storage, DNA provides a stable storage medium that can be saved for thousands of years in the right conditions. In particular, regularly used hard drives often fail after 5 years, with solid-state drives lasting a bit longer. While in cold storage, hard drives can last for upwards of a decade, but despite all of the precautions, the data must eventually be migrated at a great cost of labor and hardware. DNA enables cost-effective and rapid duplication as well as incredible density with 10 full-length movies fitting into a volume the size of a grain of sand and could be kept in appropriately sized glass beads.
The amount of data that the world will produce and require storage will only increase as the human population becomes both larger and more linked to the digital world. Major corporations innovating to create a more efficient means of storage not only benefits them but humanity at large, in the form of a cleaner environment, lower energy prices, and safe storage.
By Eduardo Longoria
Editor: Rajaneesh K. Gopinath, Ph.D.