This is an exercise for my upcoming DNA book - I know there are typos, and that things are not yet good. Also, there might be real biological mistakes in there at this stage. I am however very interested in hearing if this kind of stuff excites you! Drop me a note on if you want to receive a note if the book is done, or if you’d like to be a beta reader. Also, you can leave comments at the end of this page. Thanks!

Welcome to this somewhat quirky glossary of terms used in this book. The ordering is more thematic than alphabetic. I’ve tried to keep things brief so you can still quickly scan for what you need to know. Full details can be found elsewhere in the book.

DNA: The “permanent record” of the genetic code. So permanent that we can still recover 50000 year old DNA.

RNA: A more fleeting form of genetic storage. Besides storing information, RNA molecules can also catalyze chemical reactions and otherwise actually make things happen

Base/nucleotide/nucleobase: the elementary unit of genetic information. A, C, G, U and T. U is used only in RNA. Every position in our genetic code can be one of four bases. Four bases together can encode 4*4*4 = 256 different sequences, the same number as 1 digital byte. Every base therefore describes 2 bits of information.

Prokaryote: Organism with no nucleus. The genetic material floats freely in the cell. Of extremely ancient origin, possibly 4 billion years old. While simpler, their way of life is very robust. With a short generation time, they can evolve very quickly.

Eukaryote: organism, multicellular or not, with cells that have a nucleus. Typically far more complex and advanced.

Nucleus: Guards the genetic material. Human chromosomes total 2 meters in length, and these are packed into a 10 micrometer nucleus. To achieve this stunning feat of cable management, chromosomes are wrapped around spools built out of things called histones.

Bacteria: One of the two classes of prokaryotes. Are very different from eukaryotes in almost every way, specifically in how DNA gets replicated.

Archaea: The other class of prokaryotes. Despite the name, these are not more ancient or simpler than bacteria. Many archaea are extremophiles and can survive in very hot, dry or radio active places. Not a single archaeum is known that is harmful to human beings. Some of them live in our intestines. Archaea are in many ways closer to us eukaryotes, in the sense that they use more similar molecular and DNA techniques.

Three domain system: The idea that life consists of archaea, bacteria and eukaryotes. The theory is that eukaryotes formed two billion years ago when an archaeum ate a bacteria, and that both survived the experience. To this day, all eukaryotes have built-in mini-bacteria that power our metabolism. We call these mitochondria.

Virus: not life. This book devotes a whole chapter to what is life and what isn’t. A virus has genetic material, but it can’t itself do anything. Only when a virus enters a host organism can it start doing things. Viruses are therefore pretty close to software, which needs a machine to run on. Viruses infect eukaryotes, prokaryotes and surprisingly, also viruses. A bacterial virus is called a (bacterio)phage, and they honestly look like this:

By Wikipedia user Adenosine source

Chromosome: a large amount of DNA, forming the hereditary material of an organism. Most bacteria have one chromosome, some have several. Eukaryotes typically have lots (dozens) of huge chromosomes, some carrying over 100 million nucleotides. Can be circular (as in most bacteria and organelles) or linear.

Plasmid: Bacteria and archaea also have smaller rings of DNA that are not crucial to their survival. There is no molecular difference between a chromosome and a plasmid, the difference is entirely functional. Bacteria can also exchange plasmids and gain features, like antibiotic resistance (thanks).

Aglet: The short plastic or metal bit at the end of a shoelace that keeps the fibers from unraveling.

Telomere: The aglet of non-circular chromosomes.

Proteins/peptides: the most versatile and powerful molecules in life. A small protein is called a peptide. Proteins can sense their environment, and react to things like temperature, acidity, chemical concentrations, magnetic fields (!!), light of specific colors, vibrations, pressure and likely many other things.

In addition, proteins can function as catalysts that effectively make certain chemical reactions happen.

Finally proteins can provide physical infrastructure, like hair, skin or bacterial motors.

Amino acid: Proteins are built out of chains of around 20 different amino acids. Amino acids attract and repel each other but also other molecules, like for example RNA and DNA. Since amino acids have mutual interactions, a specific amino acid sequence will fold up to a shape. This shape can be useful as physical infrastructure, to catalyze chemical reactions or to sense the environment.

Gene: We used to think that one gene would contain the genetic material for one protein, and that such a protein would have a clearly defined function. No such luck. Functionally we know what a gene is, but we now know a gene can lead to thousands of different proteins. We also used to think that more genes equalled more complexity, but a potato has more genes than we do. Perhaps the potato knows something we don’t.

Transcription: The conversion of the more permanent DNA to the more fleeting RNA. Typically only relevant sections of DNA are transcribed to RNA, for example a single gene. Compare the conversion to RNA to the opening of a file on your computer - the contents of that file are then copied from disk (or SSD) to RAM memory, so they can easily be processed and manipulated.

Codon: three nucleotides that specify an amino acid, or signify that a gene is about to begin (start codon) or end (stop codon).

Translation by the ribosome: conversion of RNA to amino acids and thus proteins. Performed by the ribosome, an ancient molecular machine that recruits amino acids based on codons ingested from RNA.

tRNA: To match codons to amino acids, nature has invented transfer RNA or tRNA. These are specially formed strings of RNA that on one side bind to the proper codon (via their anti-codon), and on the other side attract the right amino acid.

16s rRNA:

Exons, introns: eukaryotic DNA is.. highly complex. Where a bacterial piece of DNA gets translated straightforwardly into RNA that then goes to the bacterial ribosome, eukaryotic DNA requires far more handling.

For reasons we may never understand, huge amounts of our genes consist of intron ‘fillers’. These fillers are faithfully converted to RNA, and then spliced out again. What these introns then go on to do is mostly a mystery. It is a bit like a novel that consists of small pieces of story, followed by hundreds of pages that we don’t read, but still have to print somehow.

The bits of DNA that do end up in the RNA used by the ribosome are called ‘exons’. By skipping some exons, life manages to mix up many different mRNA molecules from the same gene. This is called alternative splicing, but it is very mainstream: 95% of our genes get this treatment.

Junk DNA: An outdated and imprecise way to refer to DNA that does not end up in proteins or as RNA. Not a helpful term.

Untranslated regions: When the introns are spliced out, we end up with a stretch of RNA which we call mRNA, short for messenger RNA. This is the message to the cell with instructions what to do. Both the beginning and end of mRNA do not consist of codons (which code for amino acids). Instead there are the 5' and 3' UnTranslated Regions, or UTRs. These UTRs influence when, how often and how long a stretch of mRNA will continue to be converted into amino acids/proteins.

Non-coding RNA: The protein-coding genes get a lot of attention, but there are similar amounts of genes that apparently are not meant to be processed by the ribosome. They still lead to RNA in the cell however. A special case are the tRNA adapters for use by the ribosome, but many other RNA constructs are known. To be honest, this is a pretty mysty area of biology. There are “long non-coding RNAs”, or lncRNAs. There are also shorter variants. Some of these may also actually be processed by the ribosome. It is quite surprising how little we know of non-coding RNA, given how much of it there appears to be.

RNA Interference:

Transposable element / transposon / TE: These are DNA sequences that sometimes manage to get themselves duplicated in a chromosome, to a new position. This is a runaway process - any TE that on average creates more than 1 copy of itself will expand with every generation. Around 50% of a typical complex genome appears to consist of such ‘selfish’ copies. There are various families of transposable elements, the largest of which (‘Alu’) alone forms 16% of the human genome. Cells fight TEs explicitly, but it appears some transposons have become part of essential genes.



Sigma factor:


Shine-Dalgarno sequence:

Okazaki fragment: