If you want to know how to design a complex computational system, the human brain is a great place to take inspiration. Even though our brains have evolved in an environment that constrains them to using biological matter as a substrate, they do an excellent job of navigating these constraints efficiently.
Understanding brains more deeply can provide a myriad of insights into the first principles of good design when it comes to computational architectures.
In Principles of Neural Design (Sterling/Laughlin) the authors take a look at several biological architectures for computation, from bacteria to big brains, and try to reverse engineer the principles of efficient design.
They start by discussing the computational properties of an E. coli bacterium, building up towards increasingly complex systems to develop guiding principles that explain what we see in the biological world.
Chapter 4 of the book, “How Bigger Brains Are Organized”, discusses the human brains ability to compress information and synthesize only that which is necessary to activate the proper response. Compression and computational efficiency play a key role in allowing our brains to effectively make sense of unfathomably large amounts of inputted information with relatively little wetware; “the most efficient designs will send only information that is essential and will send it at the lowest rate allowable to serve a given purpose. If information can be sent without any wire at all, that is best. If wires are absolutely needed, they should be as short and thin as possible. These principles allow substantial insight into how bigger brains are organized.” (p. 26)
Drawing on human brains as a success story for computational efficiency, it is reasonable to conclude that, as designers, our ability to compress information intelligently corresponds directly to the strength of our silicon based computational systems.
Keys, Keyholes, and Locks
I like to use the metaphor of a key, a keyhole, and a lock when thinking about the transfer of information.
A key is an input of information in this example. Using either ridges or dimples, keys have built-in characteristics, so they can be read as distinct entities that can turn a lock ON/OFF if they meet the proper physical requirements. Each instance of a key’s characteristics, a dimple or ridge, is itself a store of information with a distinct form, and when coupled together, these stores of information can combinatorially increase the way one key input can be distinguished from another.
A keyhole is a receptor that also has a specific form. Similarly to a neural receptor, a keyhole has a distinct topography which can only be matched by the correct input characteristics. When the space that it offers is appropriately matched by an input, it will activate an encoded response.
A lock is an output that has two states. When the input successfully fits into the receptor, the lock is activated to turn ON/OFF.
This interaction between two specific forms, the input and receptor, which in coordination can lead to an activated response, allows our metaphor to stay consistent with Claude Shannon’s theory of information. The essence of this theory is that the brain that is receiving inputs already has preconceived notions about what those inputs will look like. And the amount of information that is communicated to your brain by signals could be measured by the narrowing of the probability distribution over the set of possible messages that are being sent.
Think of a key as sending messages, where each ridge or dimple carries one part of the overall message. The amount of information that is able to be sent can be measured by how narrow the probability distribution is for the keys that work, with relation to all possible keys. Each additional dimple or ridge allows for further narrowing of that space, which in turn allows for more information to be carried. As it goes further into the keyhole, the possible messages that are being sent becomes increasingly constrained by the shape of the topography until the required combination of input characteristics is highly differentiated and the lock is activated.
Let’s imagine for example, a single dimple key with 10 dimples that vary in size from 1-10, and a corresponding key hole that can read the size of each dimple in order to allow entry. The possible distinct forms for this arrangement would be 10^10. A key in this arrangement carries much more information about it’s state than an arrangement of 5 dimples that vary in size from 1-4 (4^5).
Furthermore, the number of distinguishing characteristics (dimples) in this case has more of an effect on how much information can be carried than how distinguishable those characteristics are (the size of each dimple). For example, an arrangement of 5 dimples varying in size from 1-4 allows for more combinations (4^5 = 1024) than an arrangement of 4 dimples varying in size from 1-5 (5^4 = 625).
We could numerically represent an example of a topography for each example, where the topography needs the following combinations of characteristics from the input in order to open:
7 5 1 2 1 2 3 8 7 10
4 3 1 1 4
5 2 2 4
Each string of digits can represent either our receptor’s topography or our key’s characteristics, where both entities need to match in order to activate the lock.
Our metaphor of key combinations gives us a foundation for what information or messages could look like in deconstructed form, namely a string of digits.
This is analogous to the kinds of messaging that our computation architectures use, binary code. Note that a binary key could be represented by dimples/no dimples at any given node on the face of the key, which could result in a code such as 011011010101 that is necessary to activate a switch.
But lets go back to the human brain and discuss how it is able to use a multidimensional topography in order to receive and synthesize messages from different sensory inputs simultaneously and activate useful responses.
Imagine that you are in the jungle and that you sense the presence of a dangerous predator nearby. On a subconscious level, the first sensory input you might receive is a sense of smell. Being unsure of what message is exactly being sent by whiff of animal odor, your brain may represent the state of danger as a probability distribution, where, in most cases, the message being sent by the odor is not necessarily indicative that there is any immediate danger.
However, you may suddenly hear the rustling of leaves in the distance, which by itself would not have spiked your adrenaline levels, but when that input is being synthesized in the context of having sensed the smell, the probability distribution of potential scenarios you could be finding yourself starts to narrow dramatically. Now here comes the big spike in adrenaline, you turn your primary input receiving organ towards the origin of the sound and sure enough, you can barely make out a vague outline in the form of a large predator. In isolation, each of these sensory experiences may have updated the probability distribution of danger in your internal model significantly, however, because these experiences are happening in conjunction with one another, the chance that they are being misperceived in this exact order becomes highly unlikely, and the adrenaline is immediately activated.
In this way, the brain is able to compress information highly effectively, relying less on each individual neural topography to alter its probability distribution. It is both diversifying itself against the risk of responding incorrectly to misperceptions or biases within those topographies, and also combinatorially reducing the number of possible states that are being represented very quickly. If we go back to our concept from information theory, namely that the amount of information being sent can be represented by the narrowing of the probability distribution over the set of possible messages that are being sent, we see that this multidimensional architecture is quite useful for compressing large amounts of information about the world state.
A ten percent chance of having smelled a predator nearby, in conjunction with a twenty percent chance of having heard a predator nearby, in conjunction with a thirty percent chance of having seen a predator nearby could be seen on its face as three conditional probabilities of there being no danger (0.9 * 0.8 * 0.7), where there is around a 50% chance that there is nothing to worry about.
However, that would be a ridiculous conclusion. The specific combination of smelling, hearing, and seeing a potential predator, in that order, is more like a key combination in a very large space of possible coexisting sensory inputs, and there is a correspondingly high likelihood that these three potentialities existing simultaneously is a real indication of danger.
The conditional probabilities, like everything else your brain is processing, are intrinsically connected. Where before, there may have been an 80% chance that this rustling sound did not indicate danger and that could be represented as a 0.8 probability, now that it comes in the context of also having sensed a smell that could potentially be sending the same message, it becomes exceedingly more unlikely that there is no danger to respond to. This is because the entire set of messages that the brain could receive after having smelled a potential predator is very large, and the specific message of hearing a potential predator is a relatively small part of that probability space. So accordingly, the relative update is that the chance this rustling sound does not indicate danger is correspondingly small.
Same for the visual representation, and when combined, the “part of the game tree” that we are in is highly representative of danger and extremely likely to benefit from increased adrenaline being pumped into our body.
Deterministic vs. Probabilistic processing
The term I have been using receptor, is usually referred to in computation as a processor. That which process inputs in order to identify what messages they are sending and activate the corresponding output.
One distinction that I would like to draw here is between a deterministic or dualistic method of processing input information and a probabilistic or non dualistic one.
In the case of keys and number strings, at each stage, unless the next node of information matches the requested message, the computational process will deem it as being unfit for activating a function. In this way it is inherently dualistic.
In our previous example we had a requisite key string of:
7 5 1 2 1 2 3 8 7 10
If you try to enter a set of information that is off by only the first digit, it is as unrecognizable to the receptor or processor as any other random string.
6 5 1 2 1 2 3 8 7 10
You will not make it past the first node. The physical analogue would be not being able to push the key in at all because the first ridge or dimple is the wrong size.
But what we really have here is a string of 10 numbers, which is, probabilistically speaking, as close as it could possibly be to the correct representation without being identical to it: A single digit in the string is +/- one value away from being correct.
From a binary perspective of whether this string is correct or not, it is one of 999,999,999 incorrect strings.
However, looking more deeply we can see that it is in a rather privileged set of 20 incorrect strings where only a single digit is off by a maximum value of one.
By identifying how closely this string resembles the solution we’re looking for, we can classify this string as being exceedingly rare (20/999,999,999) in terms of closeness to our solution. This may inform a conclusion for example that the key is very likely to have been damaged or input incorrectly.
What we are doing is providing context for each message that is being sent on each node of our string. Instead of giving a binary response of whether the value on node 1 is correct/incorrect, we are seeing how close it is to being correct and what values it is surrounded by. In essence, this would be a similar process to what we discussed earlier with contextualizing sensory inputs that are being received in conjunction with each other.
Notes on compression, non dualism, and quantum theory
This could be expanded upon further, but it is worth noting that as our understanding of the physical universe has progressed in modern times, it has shifted from a dualistic understanding of the existence/nonexistence of particles to a non dualistic and probabilistic representation of particles referred to as quantum mechanics.
A unifying theory of physics attempts to break down the fundamental building blocks of our universe into their most basic forms. As you zoom into what makes up the matter in our universe, things get smaller and smaller in size. I do not have a strong technical understanding of quantum mechanics, but my model of the universe is that reduction in size reaches a physical limit, where in order to further reduce, you need to have an entity represented probabilistically in a state of existence/nonexistence.
I personally think it is reasonable to assume that complete nothingness, non existence without potentiality is unobservable in the physical universe, and a nonstarter philosophically. Matter may only approach nothingness asymptotically without ever reaching it.
Okay, went off the deep end a little bit there.
So what implications does this have for information, compression, and designing computational architectures that are useful?
Well, as we discussed earlier in the example of being stalked by a predator, if you have multiple dimensions of information processing, even small approximations of certain messages being carried can be enough when received in conjunction drastically narrow the probability distribution used to determine the proper response.
That means that receptors can be much more sensitive to information. If you think of information as being inherently irreducible, like a signal that grows more and more imperceptible the farther away it is, or only lights up very infrequently, but is still perceptible in principle, the ability to synthesize this information could serve as a form of compression whereby small indications of messages that take up less “hard drive” space can be analyzed in conjunction with each other to draw novel conclusions about meaning. This would allow an architecture to accurately output responses using very little inputted information, which could be classified as a form of sensitivity.