Sounds like Internet of Things? Maybe a bit more exciting?
Before getting too hyped, the reason why you’d love to store any kind of digital data into DNA is because of two main reasons: it is the most compact medium storage that we have found so far, and the most efficient one in the long run.
For the first one, let’s just remember that we are made up of millions of microscopic cells, each one of these very tiny things, follow the instructions of DNA, which is even more microscopic. Even so, DNA contains all the necessary information to make us who we are. From our hair color, to our lifespan, *DNA is the king.
Then what I mean when I say “the long run” is that, if kept in optimal conditions, DNA can last up to (literally) thousands of years. You’ve probably heard of those times in which researchers extract DNA from neanderthals, mammoths, or other ancient beings. That’s the proof that DNA can last for long… really long.
As you we can see, DNA already stores a bunch of information in every living thing. So what’s different? Well, in the XXI century, we’re giving this molecule a different, yet interesting enough purpose: storing digital data!
The digital era
Every time that I think about the past, I feel like I’m living in the future. Weird statement, but I’m supposed to feel like living in the present, aren’t I? What I’m trying to say is that with the Internet of Things, Artificial Intelligence, blockchain, quantum computing, and other technologies, our daily lives have become truly different, and we have somehow, evolved as a species.
Everything is digital, everything is about computing. It is believed that over 2.5 quintillion bytes (2.5e+9 GB) of the data is created every day! How are we keeping up with this pace? How do computers store data?
Data, data, data
Computers are these devices that receive, process, store, and output data. Here, we will focus on the storing part. 0s and 1s are quite famous for this. At its core, a computer operates in this binary system, but how thought? How is it that images, videos, documents, apps, or music can be stored in just a sequence of zeros and ones?
I didn’t understand that either until some days ago. The simplest type of data to explain for me, are images. The word pixel is formed by two parts: picture and element. A group of pixels are therefore, the elements that an image is made up of. Now, entering into the computing context a *bit more, a pixel is also a sequence of values. For color images, these values are called RGB, which stands for Red, Green, and Blue. In a given proportion, these values can create different colors and tones, and when more pixels with these values group together, they create an image with color.
Here when we say values, we refer to numbers. An example of an RGB value is (253, 78, 141), which creates a pinkish color. So let’s pay attention to these numbers: they can be represented in a binary code!
If you’ve ever created or learned a strange language to tell secrets to your friends, you know that you needed to create an equivalence for what each letter was in the secret language and in English (or the language that you normally speak).
Well, when transforming RGB values, or any other numbers into binary code (0s and 1s), we also have these kinds of equivalences. The most common system to do this, is called UTF-8, which stands for “Unicode Transformation Format — 8 bits”.
Before we get confused with the name, a bit is the basic unit of information in computing. It is either a 0, or a 1. A byte in contrast, is made up of bits, normally, a byte equals 8 bits, so 8 0s OR 1s. An example of a byte is “01101011”.
Therefore, the UTF-8 is nothing but that equivalence between information as we see it, and a binary code.
Once we’ve translated these colors into numbers, we can do the same with numbers into letters of DNA. The alphabet is fairly simple: A, C, G, T. These letters stand for the names of the chemicals: adenine, cytosine, guanine, and thymine. Normally, the order of these, would determine characteristics in living things, but here, it will be the for an image.
The most common equivalence that we can take into account for this is: A = 00 C = 01 G = 10 T = 11
Images
Here, we can already see that DNA is more effective, since one single letter can store two bits.
We come back to the question: how is it that a bunch of 0s and 1s can store text, apps, or videos? I was probably wrong when I said that images would be the easiest example to start with, but wasn’t it cool?
Now, the truth is that everything, from simple text, to 4k videos, can be stored with the same binary system, and thus, it can also be stored in DNA. The secret for this, is again the UTF-8, and our equivalence to DNA letters.
Here’s an example of a simple program I created to turn alphanumerical data into DNA:
Keep reading with a 7-day free trial
Subscribe to Biopunk to keep reading this post and get 7 days of free access to the full post archives.