This is an exercise for my upcoming DNA book - I know there are typos, and that things are not yet good. Also, there might be real biological mistakes in there at this stage. I am however very interested in hearing if this kind of stuff excites you! Drop me a note on email@example.com if you want to receive a note if the book is done, or if you’d like to be a beta reader. Also, you can leave comments at the end of this page. Thanks!
Way back in the mists of time of 2001, my company PowerDNS wasn’t doing too well1, and I suddenly developed an overwhelming interest in.. well, anything else.
I decided to read up on human metabolism, and as I did so, I saw a lot of stuff about genes and bacteria. And to my horror, I found that I was a bit hazy on what genes, genomes, chromosomes, bacteria and viruses were precisely.
As we head into this book, there is a pretty good chance you too may not have these things on your radar as well as you’d like. So let’s briefly revisit the past 4 billion2 years of life to get us up to speed.
We’ve been studying life probably for as long as humans have been around. We have a very strong desire to understand things and categorizing them. In previous millennia, the study of life focused on what was available: appearances and behaviour of organisms.
These days however we have vastly superior technology, one that is far more objective. All life depends on DNA, and we can read this genetic material with great ease. Computers can then dispassionately tell us how these DNA sequences, and thus their organisms, relate.
This only works because every living thing on earth runs on DNA. We do not know of a single exception. This is remarkable, because as best as can tell, life has been around for over 4 billion years, and in all that time, nature has kept near total compatibility.
If you’ve ever tried to get some data from a 20 year old computer, you know how stunning an achievement this is.
The DNA molecule
At the very core of life hides the DNA molecule. I still find this hard to grasp, but life stores data in actual single molecules. Know that a whole bacterium typically contains 750 kilobytes of DNA, stored in a single (long) molecule that winds its way through the micrometer sized cell. Human DNA is stored in 46 molecules, and weighs in at around 750 megabytes.
Is it reasonable to talk about kilobytes? When I first started studying DNA in 2001, this kind of talk was heretic. Kilobytes were for computers. You don’t talk about biology this way3! If anything, there was talk of ‘kilo-basepairs’.
At the time, as a computer person, I disagreed. Information is information, regardless of if you store it on rotating magnetic disks or in a molecule in an organism.
When I wanted to learn more about how DNA powered life, I found lots of information on the molecular aspects of DNA and proteins - but little that actually put information at the center of life.
You can compare this to trying to understand a computer by studying silicon and semi-conductors – useful knowledge, but it will never tell you why it is taking ages to upload a file.
DNA absolutely determines everything in nature – from simple chemical reactions to high level behaviour. The study of life started out by looking at forms, shapes and behaviour. It then moved on to understanding how all the molecules worked, including the DNA molecule.
In this book we take the next step, and think about life as powered through the information in DNA, and how this builds the molecules that lead to shapes, forms and behaviour 4.
The alphabet and the words
The DNA molecule stores four different letters, and we call these A, C, G and T. We could also call them 0, 1, 2, 3 and it would work just as well. Each of these letters is itself a molecule, and when we string and bind these molecules together, we call the result DNA.
All of life uses the same DNA alphabet5, which is already impressive enough. Three DNA letters together form a word, known as a ‘codon’. A codon specifies the chemical composition of life, so it is a very important thing.
Strikingly enough, all of life (with interesting minor exceptions), also uses the exact same table to map codons to chemicals.
This is akin to discovering all the world not only uses the same alphabet, but also shares a dictionary.
So what do we find?
For all forms of life we have now collected astounding amounts of DNA sequences. And these tell us a very clear story. There are three branches of life: two kinds of bacteria, and the rest. This is not any kind of division Linnaeus would come up with. Traditionally you’d divide life up into “things with wings” and “plants” etc.
But the DNA tells us a very different, if baffling story. Life consists of two kinds of bacteria that have completely different fundamentals, for example building their membranes out of entirely different substances. One replicates their DNA starting in one place, the other apparently just starts copying anywhere.
Crucially, while both kinds of bacteria use DNA (of course), and can interchange genetic material, we barely find any trace of intermediate forms. Either a microbe is an archaea, or it is a regular bacterium.
By studying the changes of DNA, we can reason back how long ago two sets of genetic material had their last common ancestor. And when we do this for bacteria and archaea, we come up with the fantastic result that both these bacteria appear to have emerged simultaneously from the dawn of time, with no known common ancestor.
This is pretty mysterious stuff of course, but what follows is even stranger. The third domain of life, which includes animals, plants, fungi and of course ourselves, clearly derives from .. an archaeum that swallowed a bacterium and made merry.
In all our cells, we can still find the remnants of a small bacterium, complete with its own DNA and its own cellular machinery. And this isn’t some kind of appendix that is hanging around for unclear reasons - this little bacterium, now known as the mitochondrion, powers everything we do.
Plants performed this stunt not just once but twice – next to the mitochondrion, they also ate a second bacterium. Cyanobacteria, which are still around, mastered the unique skill of using sunlight to convert air into sugar. This invention happened exactly once in nature as far as we know – and all plants rely on live-in cyanobacterial remnants (chloroplasts), complete with their own DNA machinery, to use sunlight to feed themselves.
So where does this leave us, the plants, animals and fungi? It appears we are the merger of two kinds of bacteria, still carrying around the forensic evidence of the most successful joint venture of all time.
So far we’ve discussed life – but it turns out DNA powers a lot more than life. It also powers viruses. 99% of the world’s DNA appears to live inside the viruses that prey on us and on bacteria.
People have been discussing if viruses are alive ever since they were discovered. A virus needs a host to reproduce, so outside the host, it just sits there. Once ensconced within its victim however, there is no denying that a virus reproduces and in fact shows all properties of “being alive”.
I’ve argued previously that a virus can best be seen as software – something that is inert by itself, but becomes active on a suitable platform.
For every organism, you’ll find there are viruses that prey on them. This oddly enough even includes viruses themselves.
Viruses are not just a nuisance, there is ample evidence that they exert a very strong influence on life. By carrying around genes, by forcing organisms to mount defenses, their stamp on nature is visible everywhere.
Finally, slightly below the level of viruses live a relatively new discovery, viroids. Viroids are almost pure information, consisting of a tiny ring of RNA (which we’ll get to). But this tiny ring of information is able to make plants very ill. The very existence of viable viroids tells us a lot about how simple the first forms of life could have been.
We’ve now covered the three main domains of life, and touched on the vast world of non-life that is still powered by DNA.
But what is in there?
Not only is the DNA molecule universal, including the A, C, G and T letters, so is the meaning of how these letters are arranged.
DNA is like one big sentence, without any paragraphs in there. Let’s start with a typical bacterium. Inside the cell floats a single circular DNA molecule, encoding for a few million DNA letters. Although there is no super clear definition, if a DNA molecule is big enough and vital for an organism, we call it a chromosome6.
On such a bacterial chromosome, we find genes, typically one gene per 1000 DNA letters. Now, a long time ago, there was a clear and simple definition of what a gene was. It turns out life is not so simple anymore, but from an information-centric standpoint we can focus initially on what a gene looks like.
In many ways, bacteria are lovely simple things, and their genetic organization is remarkably simple. It almost reads like a technical specification.
If the DNA letters form an alphabet, codons form words. But DNA words are all three letters long. Because of this, there are 4³=64 codons. Bacterial genes consist of a sequence of codons.
Almost every bacterial gene starts with ‘ATG’, this is called the ‘start codon’. Next to the ATG start codon, there are also three stop codons (TAG, TAA, TGA). In between are codons which specify amino acids.
Amino acids are small molecules, which when chained together form ‘peptides’ or ‘proteins’. Proteins are incredibly versatile molecules which can for example sense temperature, acidity, glucose concentration, and a million other things. In addition, proteins can cause chemical reactions to happen, or conversely, stop them in their tracks.
To a reasonable extent, everything that senses or does things in life is a protein (or was built by a protein).
Genes contain codons which specify the amino acids that should be chained together to form a protein.
So to summarise, we have an organism, which has a chromosome, and on this chromosome there are genes, which consist of codons, which specify which amino acids go into a protein. And proteins can sense and do almost everything.
All of life is powered by DNA. Even a lot of non-life runs on DNA. There are two very different kinds of bacteria. When these merged billions of years ago, a new domain of life sprang into existence. This domain includes plants, fungi, animals and us.
Viruses and viroids are not living things by themselves, but once they are in a host, they very much are. You can see a virus as software that runs on a separate platform.
- Every organism has
- A genome
- Which consists of one or more chromosomes (and plasmids)
- Which contain genes which consist of
- Codons that specify beginning and end of a gene, plus the amino acids that make up
- Proteins, that sense and do everything
Now – this is where most descriptions of the architecture of life declare victory, but that has always left me with a hollow feeling. It is all well and good that all these things exist, but how do they actually add up to a living organism?
For this, let us head to the next chapter.
If life being billions of years old upsets you, this likely isn’t the book for you. ↩︎
This reminds me a bit of a Second World War story. There was a huge shortage of metals. When the development of the atomic bomb required many tons of highly conductive wire, an alternate material was found: silver. But who had significant quantities of silver? The US Treasury Department! “Nichols met with Undersecretary of the Treasury Daniel Bell on August 3, 1942, to inquire about borrowing 6,000 tons of silver from Treasury vaults. In his memoirs, Nichols relates that Bell indignantly informed him that the Treasury’s unit of measure was the troy ounce”. Source: American Scientist. ↩︎
It is possible to sequence the DNA of an organism, and then to use the resulting file on disk to order up the recreation of that very same DNA. When this new DNA is inserted in a suitable cell without DNA, that cell will take up the behaviour of the original organism. Such a round trip proves beyond any doubt that DNA the code of life is digital. ↩︎
And a warm welcome to all actual biologists reading this book! Because this is biology, and because there are millions of different organisms, all rules have an exception somewhere. The book would become incredibly tedious however if I had to bring in every exotic counter example. But for this once, a few special nucleotides (as A, C, G and T are known) do exist, but they are very much the exception. ↩︎
Bacteria also carry around ‘plasmids’. These are smaller rings of DNA that can give a bacterium additional capabilities. Plasmids can for example confer antibiotic resistance (thanks). It is fair to say that plasmids are like plugins for life. Plasmids can also be very large, and it is not clear when a plasmid becomes a chromosome, or even a ‘megaplasmid’. ↩︎