Is biology too complex to ever understand?

Note: This article covers a lot of well-trodden ground, although this post has the benefit of 1) being rather brief and 2) advancing slightly on from earlier work. Noam Chomsky wrote 39 dense pages of philosophy on this subject in “Mysteries of Nature: How Deeply Hidden”. Evolutionary’s leading light Ernst Meyr wrote a whole book called “What makes biology unique”. Finally, a lot of this thinking is also covered in the quite philosophical work “The way of the Cell” by Franklin M. Harold.

Two centuries ago, biology was a very fuzzy science, involving observing animal and plant behaviour and the physiology of life, through ever more sophisticated means. It crucially did not involve math nor were there many hard and fast rules.

For example, many animals and plants were grouped together as species, clades, or families based on their outward appearance or behaviour, which led to some plausible choices that we now know to have been spectacularly wrong.

Starting in 1853, with Gregor Mendel’s pea experiments, biology suddenly became explicitly digital. The shape, color and height of certain plant parts turned out to either be inherited, or not. In addition when crossing strains, strong predictions could be made about which fraction of the offspring would retain a feature. Mendel thus formulated the Laws of Segregation and Independent Assortment.

With these laws, biology suddenly became of interest to physicists and those of a more mathematical bent.

Physics has been one long series of unifications where initially messy phenomena could, once understood, be reduced to clear cut formulas that brought clarity to our world. One very familiar unification is Isaac Newton’s formulation of the laws of gravity and motion. First there were rules for describing the orbits of planets precisely, rules which offered no explanation for these motions. After Newton invented his famous laws, which explained the movements of moons and planets, the fall of an apple turned out to be ruled by those exact same formulas.

Further unifications then showed that all matter was composed of a manageable number of different atoms, which in turn were found to be composed of just three elementary particles, the proton, the electron and the neutron.

More sophisticated experiments created all kinds of new elementary particles, leading one physicist to quip “Who ordered that?”, but after some decades order was restored and many of these new particles and the old guard turned out to be created out of just 3+3 different smaller things called quarks. The rest also neatly fit in a hierarchy.

Standard model of elementary particles. Graphic by Wikipedia user MissMJ

With some further discoveries, all particles could then be described in an attractive table that left some holes for as yet undiscovered particles, and lo, in due time all of these were found to actually exist, with the last gap closed when the Large Hardron Collider at CERN unambiguously found the Higgs Boson - to no one’s particular surprise.

Physics envy

This steady march of wrapping ever more complex things into simpler models that can then be used to predict the world is very seductive and has led to the phenomenon of “physics envy”. Keen observers of history will note that this seemingly perfect progress of science actually did not happen quite as smoothly, but in hindsight it surely looks attractive.

Physics and biology met during the second world war when Erwin Schrödinger gave a series of lectures in Dublin called “What is life?” in which he derived the nature of the “heredity molecule” from first principles, apparently without knowing that biologist H.J. Muller had already done so in 1929. He also got it pretty wrong, but in the words of a great biologist-philosopher, Schrödinger made a fool of himself in the name of science, and set the agenda for decades of research.

Following this kickoff, physicists and mathematicians arrived in droves to opine on the as yet unknown mechanisms of heredity. As more became known about the nature of DNA, things appeared to be progressing in a remarkably physicsy way.

Life was known to consist of innumerable different proteins, and these proteins turned out to consist of only 20 different molecules called amino acids. These were duly identified, studied and named.

In the course of understanding DNA, it became clear that DNA itself consisted of four different components, which were patiently isolated. The four DNA “nucleotides” were named after where they were first isolated - adenine (A) from the pancreas gland, cytosine (C) from the cell, guanine (G) from Guano bird droppings, thymine (T) from the Thymus gland.

Subsequently, it was found that life features a (near) universal table of how groupings of three nucleotides encode for amino acids. Much like two “up” quarks and one “down” quark will always create a proton, a triplet (or ‘codon’) of Guanine, Thymine and Cytosine will always lead to a Valine amino acid, in all forms of life, from the lowliest virus to the mightiest tree.

Various things in DNA have also received particle-physics sounding names like intron, codon, cistron, operon and exon (compare proton, neutron, hadron etc).

The parallels with physics are astounding, and leading lights like Von Neumann soon started offering their thoughts and calculations on how life might work.

Further successes

After further literal millions of person-years of investigation, many of the fundamentals of life are now clear to us. DNA encodes for RNA, RNA encodes for amino acids, and many amino acids together form proteins.

We now also know that proteins interact with both RNA and DNA, and that proteins can cause a part of DNA to be more, less, or not converted into amino acids.

In addition, proteins can change shape based on temperature, presence of light, acidity or many other factors. Crucially, in one shape, the protein may influence DNA, and in another shape it may not (or in a different manner).

In this way we understand how the complex of RNA, DNA, amino acids and proteins form algorithms that sense the world and react to it. It is a veritable programming language.

These lessons have delivered some miraculous results. For example, unreasonably early, biologists figured out the constituent amino acid components of the insulin hormone, so vital for treating diabetes. Once this was understood, it became possible to inject specifically encoded DNA into bacteria to create human insulin, which turned out to work better than the previously used animal insulin.

Decades later, the much advanced understanding of molecular biology made it possible to tease out enough of the workings of the highly complex hepatitis C virus and deliver a definitive cure, delivered in a few pills. Similarly, HIV is now understood well enough that it seems certain we’ll beat it permanently in the near future.

The “Who ordered that” moment

In biology, ‘physics’ thinking continued to hold sway, with for example the ‘gene’ given the elementary particle treatment, with talk of ’the gene for brown eyes’ or ’the longevity gene’. Over the decades however it has become clear that most genes have far from one function, and that almost every biological feature comes from dozens or hundreds of genes. And in fact, we are no longer even very sure how many genes a human has, with the initial estimate of over 1 million now being decreased to “around 20000”.

While there have been isolated and encouraging breakthroughs, the vast majority of “life” remains very poorly understood. Physics has gotten very far by incessantly studying isolated molecules, atoms or particles until the workings under ideal conditions are understood, and later generalizing that understanding to higher levels.

As an example, the isolated hydrogen atom is now understood to 13 decimal positions, and from that understanding we have built and honed most of quantum theory.

In molecular biology, we could call the “lac operon” the hydrogen atom of life. The lac operon is a set of 3 genes that together regulate glucose metabolism in many bacteria. Lac has been studied to within an inch of its life. It is literally a textbook case, in the sense that no molecular biology textbook skips treating this operon.

Lac is in a sense a thermostat, one that springs into action when glucose levels are too low, much like a heating thermostat is enabled by a drop in temperature.

In practice however it turns out lac does not trigger on a lack of glucose - it senses an excess of a chemical that builds up when glucose is no longer being transported into the bacterium. This kind of thing is rife in life and clearly shows how life has not been “designed” by any human kind of designer.

Although lac has now been studied for six decades, it is increasingly clear that the textbook picture of its operation (‘a protein changes conformation due to buildup of a secondary chemical, thus stimulating the conversion of lactose into glucose, if lactose is available’) is woefully incomplete.

It is likely possible to write a 1000 page textbook that covers nothing but the very elementary lac operon.

Might biology be too complex for the “physics” treatment?

The gathering of facts in biology continues apace. The US NCBI maintains the PubMed database of all biomedical and life science articles. When I joined TU Delft’s bionanoscience department in 2013, I clearly remember PubMed containing 22 million articles. Today in 2019, the number stands at 30 million. This boils down to over 100 new articles an hour, 24 hours a day, 365 days a year.

When I read through recent research, I see researchers finding out ever more about ever more things. It appears research is still operating on the hope that at the end of this tunnel lies some kind of understanding so we can “write the book on x”, where x might be something like “vision” or “sense of touch”.

Physics has long been guided by the belief that after sufficient observation, not only will we find an underlying law, up till very recently we’ve also been guaranteed a “pretty” or even beautiful synthesis.

Note: it is somewhat ironic that physics itself is currently “stuck” on an unattractive but highly effective theory (the “standard model”) and that nature is steadfastly refusing to cooperate with a newly designed and much prettier supersymmetric model.

One of the heaviest researched subjects is diabetes, which affects affluent Americans and Europeans in huge numbers. This generates vast amounts of funding. But despite literally millions of pages having being written on this important subject, we’ve so far have not succeeded in explaining the most basic of things about diabetes, like why it is suddenly surging.

Treatments have similarly only advanced piecemeal, even though we’ve gone down to tremendous depths in the insulin rabbit hole.

Similarly, progress in understanding Alzheimer’s disease has been very disappointing, with all leading theories now having bitten the dust, even though it has been possible to create and trial (unsuccessfully) dozens of medicines based on the insights gathered so far.

To be very clear, this is not because of lack of effort or because of “bad science”. We have learned stunning amounts about the etiology of many diseases, but a unified and actionable theory has so far eluded us for many of them. But the brilliance that has been applied to the problems is very impressive.

Might biology be too complex for our understanding?

This is quite a profound question. I believe that it is likely that parts of biology will not lend themselves to a “physics”-like understanding, where wise people can sit down and explain what is going on. In physics, very high levels of understanding have been possible because of abstraction.

For example, the core expression of general relativity is quite short:

\[ G_{\mu \nu} + \Lambda g_{\mu \nu} = 8 \pi T_{\mu \nu} \]

However, if we were to write out the meaning of this formula in simpler mathematics (which we might need to do to perform actual calculations), it would run to hundreds of pages.

The only reason the big brains of general relativity can reason about their subject is because they created syntax, concepts and words that operate at a higher level. General relativity was amenable to that.

But if things had been different, it might well have been the case that general relativity simply exceeded the capacity of human minds to comprehend it all. It could be argued that the combination of general relativity and quantum mechanics is actually beyond this point - humanity has not been able to conjoin these two theories into one, even though both describe the same things.

There is no rule that says nature can not be more complex than our brains can handle. And after billions of years of evolution, why should it be simple?

Given the 100 articles being written every hour on (medical) biology, I find it entirely possible that life might be so complex it will similarly exceed our capacity for understanding it in a “physics” like way.

Yet this still appears to be what we are aiming for.

But then what?

If we concede that biology should exit ‘physics mode’, several very good options remain. As a case in point, the hepatitis C cure was achieved by understanding ’enough’ of hepatitis C to figure out which key proteins to block. Opportunistic research that does not reach full understanding can be exceptionally valuable, with hepatitis C cures now achieving near 100% effectiveness, even though the mechanism of action of the key protein (NS5A) is poorly understood.

Secondly, we may have to admit that no human being can understand even a tiny fraction of the 100 papers being published every hour. It is now also widely understood that many scientific articles have more authors than readers, which is a damn shame.

Although biology will perhaps never achieve the abstraction levels of general relativity (which compacts 100s of pages of math into one neat formula), it may instead be possible to no longer write articles for low numbers of human readers, but instead work on encoding the findings therein for machine consumption.

Even with limited and realistic expectations for machine learning, it is entirely conceivable that a computer that does have the ability to ingest the findings of 100 papers/hour will be able to find combinations and conclusions that so far elude our thinking.

Isn’t this already happening?

Bioinformatics fills database after database with annotations and conclusions. There are however very large worries about the accuracy of these databases. They are clearly “after the fact”. The currency of academia is the scientific paper, even if it only gets read by three people. Eventually the results of the paper may percolate to the databases, but this is very much a secondary thing.

I would argue that this situation should be reversed - the prime result of research should be very well annotated entries in knowledge bases, which then reference the article for further background. But the database is what counts, and not the other way around.

Fundamental research

As noted, the very deep roots of life, the workings of DNA, RNA and amino acids are universal. Even the simplest forms of life exhibit this full machinery. And this area does lend itself well to fundamental research in hopes of gaining a unified understanding - because most of it IS in fact unified.

For example, in my own old department of bionanoscience, researchers are doing typical ‘physicist’ things like studying idealized forms of DNA, for example, living DNA unconstrained by cell walls.

Because this kind of research can study specific aspects in isolation, it offers a higher chance of actually finding a “pot of gold” on the other side of the rainbow & delivering a deeper and unified understanding of life.

Summarising

Biology, life, DNA and physics met in the 1940s, and the initial successes of working out the basic molecular principles of life where very ‘physics’-like. This led to hard and fast rules and even a universal codon table of life, covering billions of years of evolution with nary a change.

Biology has since turned into a gigantic fact-finding mission, assembling 100 articles every hour, full of new and recycled observations. This stream of words vastly exceeds what any human being could possibly comprehend, even when concentrating on a sliver.

A real question meanwhile is if we can hope that all these facts together will lead to Newton-style synthesis of new unified understandings of life, if we’ll create a syntax and language allowing us to speak about life on a higher level, much as has been achieved by the mathematical notations found in general relativity.

If if turns out this answer is no, and we know that there is too much unsynthesised data for humans to make sense of, it may be better to 1) Not attempt such global understanding, but see how for we get with knowing some specific things 2) Gather everything we learn into first-class quality databases that might enable computers to make sense of what we have learned.

Meanwhile, fundamental research into universal (and perhaps idealized) aspects of life offers a higher possibility of finding deep truths and laws.