A brief history of 4 billion years of life || DNA & the technology of life

Way back in the mists of time of 2001, my company @PowerDNS@ wasn’t doing too well¹, and I suddenly developed an overwhelming interest in… well, anything else.

I decided to read up on human metabolism, and as I did so, I saw a lot of stuff about genes and bacteria. To my horror, I found that I was a bit hazy on what genes, genomes, chromosomes, bacteria and viruses were precisely.

As we head into this book, there is a pretty good chance you too may not have these things on your radar as well as you’d like. So let’s briefly revisit the past 4 billion² years of life to get us up to speed.

Studying life

We’ve been studying life probably for as long as humans have been around. We have a very strong desire to understand things and categorize them. In previous millennia, the study of life focused on what was available: appearances and behavior of organisms.

However, for roughly a century we have known that all life depends on the miracle organic polymer known as deoxyribonucleic acid (DNA), sequences of which encode genetic instructions for the functioning of organisms. Our current day technology lets us read these sequences with great ease. Computers can then dispassionately tell us how these DNA sequences, and thus their organisms, relate.

This only works because every living thing on earth runs on DNA. We do not know of a single exception. This is remarkable, because as best as we can tell, life has been around for over 4 billion years, and in all that time, nature has kept near total compatibility.

If you’ve ever tried to get some data from a 20 year old computer, you know how stunning an achievement this is.

The DNA molecule

The DNA molecule is hidden in the very core of life. I still find this hard to grasp, but life stores data in actual single molecules. Know that a whole bacterium typically contains 750 kilobytes of DNA, stored in a single (long) molecule that winds its way through the micrometer sized cell. Human DNA is stored in 46 molecules, and weighs in at around 750 megabytes.

Is it reasonable to talk about kilobytes? When I first started studying DNA in 2001, this kind of talk was heretical. Kilobytes were for computers. You don’t talk about biology this way³! If anything, there was talk of ‘kilo-base pairs’.

At the time, as a computer person, I disagreed. Information is information, regardless of if you store it on rotating magnetic disks or in a molecule in an organism.

When I wanted to learn more about how DNA powered life, I found lots of information on the molecular aspects of DNA and proteins - but little that actually put information at the center of life.

You can compare this to trying to understand a computer by studying silicon and semiconductors – useful knowledge, but it will never tell you why it is taking ages to upload a file.

DNA absolutely determines everything in nature – from simple chemical reactions to high level behavior. The study of life started out by looking at forms, shapes and behavior. It then moved on to understanding how all the molecules worked, including the DNA molecule.

In this book we take the next step, and think about life as powered through the information in DNA, and how this builds the molecules that lead to shapes, forms and behavior ⁴.

The alphabet and the words

We have said that the DNA molecule is a polymer, which simply means it’s a sequence of smaller molecules chained together. These smaller molecules that comprises DNA are called nucleotides. They are labelled using four different letters, and we call these A, C, G and T. For the purpose of describing DNA as information, we could very well have also called them 0, 1, 2, and 3.

All of life uses the same DNA alphabet⁵, which is already impressive enough. Three DNA letters together form a word, known as a ‘codon’. A codon specifies the chemical composition of life, so it is a very important thing.

Strikingly enough, all of life (with interesting minor exceptions), also uses the exact same table to map codons to chemicals.

This is akin to discovering all the world not only uses the same alphabet, but also shares a dictionary.

So what do we find?

For all forms of life we have now collected astounding amounts of DNA sequences, and these tell us a very clear story. There are three branches of life: two kinds of bacteria each with its own branch, and every other organism goes into the third. This is not any kind of division @Linnaeus@ would come up with. Traditionally you would divide life up into categories such as “things with wings” and “plants” and so on.

But the DNA tells us a very different, if baffling story. Life consists of two kinds of bacteria that have completely different fundamentals, for example building their membranes out of entirely different substances. One replicates their DNA starting in one place, the other apparently just starts copying anywhere.

Crucially, while both kinds of bacteria use DNA (of course), and can interchange genetic material, we barely find any trace of intermediate forms. Either a microbe is an archaea, or it is a regular bacterium.

By studying the changes of DNA, we can reason back how long ago two sets of genetic material had their last common ancestor. And when we do this for bacteria and archaea, we come up with the fantastic result that both these bacteria appear to have emerged simultaneously from the dawn of time, with no known common ancestor.

This is pretty mysterious stuff of course, but what follows is even stranger. The third domain of life, which includes animals, plants, fungi and of course ourselves, clearly derives from an archaeum that swallowed a bacterium and made merry. We call the organisms in this domain ‘eukaryotes’.

In all our cells, we can still find the remnants of a small bacterium, complete with its own DNA and its own cellular machinery. And this isn’t some kind of appendix that is hanging around for unclear reasons - this little bacterium, now known as the @mitochondrion@, powers everything we do.

Plants performed this stunt not just once but twice – next to the mitochondrion, they also ate a second bacterium. @Cyanobacteria@, which are still around, mastered the unique skill of using sunlight to convert air into sugar. This invention happened exactly once in nature as far as we know – and all plants rely on live-in cyanobacterial remnants (@chloroplasts@), complete with their own DNA machinery, to use sunlight to feed themselves.

So where does this leave us, the plants, animals and fungi? It appears we are the merger of two kinds of bacteria, still carrying around the forensic evidence of the most successful @joint venture@ of all time.

Not-life

So far we’ve discussed life – but it turns out DNA powers a lot more than life. It also powers viruses. 99% of the world’s DNA appears to live inside the viruses that prey on us and on bacteria.

People have been discussing if viruses are alive ever since they were discovered. A virus needs a host to reproduce, so outside the host, it just sits there. Once ensconced within its victim however, there is no denying that a virus reproduces and in fact shows all properties of “being alive”.

I’ve argued in a blog post (referenced below) that a virus can best be seen as software – something that is inert by itself, but becomes active on a suitable platform.

For every organism, you’ll find there are viruses that prey on them. This oddly enough even includes viruses themselves.

Viruses are not just a nuisance, there is ample evidence that they exert a very strong influence on life. By carrying around genes, by forcing organisms to mount defenses, their stamp on nature is visible everywhere.

Finally, slightly below the level of viruses live a relatively new discovery, viroids. Viroids are almost pure information, consisting of a tiny ring of RNA (which we’ll get to). But this tiny ring of information is able to make plants very ill. The very existence of viable viroids tells us a lot about how simple the first forms of life could have been.

The architecture

We’ve now covered the three main domains of life, and touched on the vast world of non-life that is still powered by DNA.

But what is in there?

Not only is the DNA molecule universal, including the A, C, G and T letters, so is the meaning of how these letters are arranged.

DNA is like one big sentence, without any paragraphs in there. Let’s start with a typical bacterium. Inside the cell floats a single circular DNA molecule, encoding for a few million DNA letters. Although there is no super clear definition, if a DNA molecule is big enough and vital for an organism, we call it a chromosome.

Note that while a bacterium is replicating, and it is always replicating, there may be additional DNA molecules present. Bacteria also carry around ‘@plasmid@s’. These are (usually) smaller rings of DNA that can give a bacterium additional capabilities. Plasmids can for example confer antibiotic resistance (thanks). It is fair to say that plasmids are like plugins for life. Plasmids can also be very large, and it is not clear when a plasmid becomes a chromosome, or even a ‘mega plasmid’.

On such a bacterial chromosome, we find genes, typically one gene per 1000 DNA letters. Now, a long time ago, there was a clear and simple definition of what a gene was. It turns out life is not so simple anymore, but from an information-centric standpoint we can focus initially on what a gene looks like.

In many ways, bacteria are lovely simple things, and their genetic organization is remarkably simple. It almost reads like a technical specification.

If the DNA letters form an alphabet, @codon@s form words. But DNA words are all three letters long. Because of this, there are 4³=64 codons. Bacterial genes consist of a sequence of codons.

Almost every bacterial gene starts with ‘@ATG@’, this is called the ‘start codon’. Next to the ATG start codon, there are also three stop codons (TAG, TAA, TGA). In between are codons which specify amino acids.

Amino acids are small molecules, which when chained together form ‘peptides’ (when they are small) or ‘proteins’ (when they are larger). Amino acids are small molecules, which when chained together form peptides, also called proteins when they are larger. Proteins are incredibly versatile molecules which can for example sense temperature, acidity, glucose concentration, and a million other things. In addition, proteins can cause chemical reactions to happen, or conversely, stop them in their tracks.

To a reasonable extent, everything that senses or does things in life is a protein (or was built by a protein).

Genes contain codons which specify the amino acids that should be chained together to form a protein.

So to recap, we have an organism, which has one or more chromosomes, and on these chromosomes there are genes, which contain codons, which specify which amino acids go into a protein. And proteins can sense and do almost everything.

Chapter Summary

All of life is powered by DNA. Even a lot of non-life runs on DNA. There are two very different kinds of bacteria. When these merged billions of years ago, a new domain of life sprang into existence. This domain includes plants, fungi, animals and us.

Viruses and viroids are not living things by themselves, but once they are in a host, they very much are. You can see a virus as software that runs on a separate platform.

Meanwhile, every organism has:

A genome
Which consists of one or more chromosomes (and plasmids)
Which contain genes which consist of
Codons that specify beginning and end of a gene, plus the amino acids that make up
Proteins, that sense and do everything

Now – this is where most descriptions of the architecture of life declare victory, but that has always left me with a hollow feeling. It is all well and good that all these things exist, but how do they actually add up to a living organism?

For this, let us head to the next chapter.

A brief history of 4 billion years of life