The prospect of storing huge amounts of data in DNA is getting closer to reality thanks to a new data recovery technique.
Microsoft seems interested in synthetic DNA. The company is considering using it in the future as a storage medium that could address the world's need for ever-increasing data storage.
Previous research has shown that only a few grams of DNA can store an exabyte of data and keep it as it is for 2.000 years.
The downside is that the method is quite expensive and extremely slow. Data writing in DNA involves the conversion of 0 and 1 into DNA molecules (adenine, thymine, cytosine and guanine), and DNA data recovery should include the decoding of the 0 and 1 files.
Finding and retrieving specific files stored in DNA is also a very big challenge.
As scientists from Microsoft Research and the University of Washington explain, without random access or the ability to selectively retrieve files from stored DNA, you will need to decode the entire set of data involved to find the files you want. Creating a random access would reduce the amount of processes that need to be done for each search.
So to get some random access to DNA, they created a "primer" library that is linked to each DNA sequence. The primers, together with a polymerase chain reaction (PCR), are used as targets to select the desired DNA fragments by random access.
"Prior to synthesizing the data from a DNA file, the researchers added two ends of each DNA sequence to the primers of the PCR primer from the primer library," he says the University of Washington.
"They then used these starters to select the desired point via random access and used a new algorithm designed to more efficiently decode and restore data to their original digital state."
Researchers have also developed an algorithm for more efficient decoding and data recovery. Microsoft researcher Sergey Yekhanin said the new algorithms are more tolerant of writing and reading DNA errors, which reduces the processes and processing required to retrieve information.
Although it is not the first time that random access to DNA has been achieved, it is the first time it has been done on such a scale, according to researchers.
The researchers encoded in a synthetic DNA a 200MB data file that ranged from 35 of 29kB to 44MB. The files contained video, audio, images and high-definition text.
After the release of the study describing the technique, they encoded and regained 400MB data in DNA.
Researchers believe that the approach they have used for random access will escalate into large DNA tanks containing several terabytes each.