Neural net models that are loosely inspired by animal nervous systems have been around for many years. They are made up of many nodes (neurons) that do a computation of some kind on the sum of the signals coming in to them. The incoming signals travel along connections that are weighted, so some signals are magnified, some are reduced. Signals on a negatively weighted connection inhibit the neuron they come into.
Weights can be thought of as an encoding. For instance, let’s say one neuron represents yellow and another blue, and they both feed to a third neuron that weights them equally. The resulting neuron might then represent “green”.
If the weights on all the connections of the net are tweaked, you can get the net as a whole to recognize patterns. For example, it might output a ‘1’ every time you present its input layer with a picture of a zebra.
Nets can learn by supervision, where they are given a set of patterns with known answers and you can correct for the discrepancy between their initial predictions and the real answer by changing all their weights. They can also learn by unsupervised self-organization where they might simply be given images and learn characteristics of images.
In recent years there have been two interesting theories which come with software that anyone can download from the internet and experiment with. One theory is the Neural Engineering Framework (NEF)) from Chris Eliasmith’s team at the University of Waterloo in Canada. They supply you with a simulation environment called Nengo – you give it the function you want it to compute, and it organizes groups of neurons to compute it in a biologically realistic way. Or you can devise neural circuits and see what they can do.
A second theory called Hierarchical Temporal Memory comes from Jeff Hawkin’s company, “Numenta”. So far they model just one layer of the neural cortex, on the theory that much of brain works on some basic principles, and those principles can be seen working even within one simple layer. Their software is also free to download, and is called “Nupic”.
Both NEF and HTM theory assume a high dimensional space with sparse signals. (A low dimensional space is easy for humans to visualize, but above 3 dimensions it becomes impossible.) To review some basics: any cube has 3 dimensions with an x-axis, a y-axis and a z-axis. If there are only a few data points in the cube, the data is sparse. A vector can be looked at as an arrow that starts at the origin (the point 0,0,0) and reaches out to any point in the cube. A vector with more than 3 numbers has more than 3 dimensions. Very high dimensional spaces have counter-intuitive properties. For instance, evolution in biology has puzzles that only make sense when you look at gene mutations as exploring a high dimensional space. Numenta uses “Sparse Distributed Representations” (SDRs), which are basically high dimensional vectors of 1’s and 0’s (you can think of 1 as ‘on’ and 0 as ‘off’), where perhaps only 2 percent of the bits are 1.
Initially Numenta used their model to predict sequences, but recently it occurred to them that when you explore an object, say by touch (with your eyes closed), you may only handle small parts of the object at a time, but you get the general idea of its shape from the sequence of touches. With your eyes open you still explore objects sequentially, this time by vision, you just don’t realize it. Your eyes jump rapidly from place to place in a scene, assembling a model of it. So Numenta has been making exciting headway on that front, and for a clear explanation of what they are attempting, see the IEEE article at:
Numenta’s learning algorithm is interesting also in that it doesn’t have connections that might start out weak and then get stronger. Instead, they look at synapses as ‘either or’. Either you have two neurons that are linked together by a growing synapse strongly enough so that they communicate, or they do not communicate at all. This is based on the observation that synapses grow and fade in the brain. So in the model, you might start out with a weak “permanence” value, and if it increases enough so that its value gets above a certain threshold, then you have a connection. HTM theory has two self-limiting influences, it forces a maximum on the number of cortical columns that are active any time, thus keeping the representation sparse, and it boosts any column that over a time period has very little activity, and conversely it inhibits any column that is active too often. By this balancing act, it increases the likelihood that every column in the cortex is involved in at least one representation.
At this point I will only give some brief highlights, because both theories are clearly explained (usually without any difficult math) by papers and videos from both groups (I give links to their explanations below).
Eliasmith’s group has used its theory to build Spaun, the world’s largest functional brain model, using 2.5 million neurons to perform eight different cognitive tasks. For example, you can give it a picture of a number and it will recognize the number and write out the number in script via a simulated 6-muscle arm. (soon a six million neuron version will be available for download). Spaun is just one of many models built with Nengo.
Two interesting insights they had were these.
1. A number can be represented, not just by one neuron, but by the average activity of a group of neurons. Even though each neuron either spikes or does not, outputting a ‘1’ or a ‘0’, the average of these 1’s and 0’s can be any fractional number. A scalar number is one dimensional, but you can represent two (or higher) dimensional numbers as well. For instance, you can have a group of neurons where each one represents a direction (North, South, West, East, Northwest, etc.) and each one fires most strongly in a preferred direction, somewhat less strongly in a nearby direction, and not at all in the opposite direction). If you have enough neurons to cover many directions, the activity of the group could represent an incoming signal that had perhaps a direction and a magnitude. (The formal term for the response curve for a neuron (for instance a neuron that responds best to the color red’ is its ‘tuning curve.’, In this example each neuron is tuned to a different orientation).
The output of the group can recreate the input. To do this have to sum the outputs of all the neurons with optimal weighting of the output of each. For instance, if input ‘x’ feeds neural group A, weights coming out of ‘A’ can recreate input ‘x’. Then you can weight ‘x’ again to produce some other function, such as the square of the input, or you can combine them with outputs from other neural groups and feed them into a third neural group. In the illustration below, the picture on the left shows that the input x is recreated from neural group A by using weights to combine the outputs of neurons in it, and then x is used again to create neural group B. The illustration on the right shows that the intermediate step (of recreating x) is not necessary, with the proper weights you do the equivalent of creating ‘x’ and then creating B from ‘x’. We can think of the connections between A neurons and one of the neurons of ‘x’ as a vector (since its a collection of numbers) and we can extend that to think of the connections of ‘A’ to all the neurons of ‘x’ as a matrix (or set of vectors stacked in rows). In fact, much of what NEF does is linear algebra.
2. One weakness of neural nets is that they don’t represent composite objects. A chair, for example, is a composite object with legs, a seat, and a back. However, Eliasmith’s group has looked at existing papers on the topic of “Vector Symbolic Architectures” which explain how to make a vector in high dimensional space represent such an object. Suppose you had an arbitrary vector that represents the word ‘DOG’, another that represents ‘CAT’ and a vector that represents the part of speech ‘VERB’ and a few other parts of speech. You can combine all these vectors with concatenation to represent the sentence “Dogs chase Cats”, but eventually your resulting vectors would get very large and very high dimensional. There are other ways to combine the parts of speech and the words so that your vectors do not get larger. You get a new vector, which does lose some information, but you can decompose it into its parts despite the loss. Because of the loss of information, there will be noise, but if you have a memory of what the vectors for “Dog” and “Cat” look like, you can clean-up the noise and get your components back. Furthermore, the vectors maintain similarity, so that if “pink” and “red” have similar vectors, then “pink square” and “red square” will also have similar vectors.
“Vector Symbolic Architectures” allow for inductive reasoning as follows: You may remember puzzles where you see a series of pictures and try to guess what the next picture should be. It turns out that if these pictures are represented as vectors of symbol combinations, you can mathematically find the transform that leads one picture to the next, and then when you get to the final missing picture, just average out all those transforms (which themselves are vectors or matrices) and you will get a transform that will likely predict the correct missing picture.
You can read more on this at:
The University of Waterloo group also offers a course, and the notes are at:
You can get the program for free from http://nengo.ca
Numenta has a great explanation of their cortex simulation at:
If you prefer videos, they have a series that explains the theory well at:
It would be interesting to see if these two theories could be combined in some way. In the “Vector Symbolic Architectures”, you can represent the sentence “Birds fly” as “Birds * Position#1 * Noun + Fly * Position#2 * Verb” (the asterix is an operator called circular convolution). The sentence can be decomposed into vectors that stand for its elements (such as ‘bird’). An object such as a “chair” could be decomposed into its 4 legs and its back and seat. So we could look at SDRs in HTM-theory and see if they too can be decomposed into elements.
HTM-theory can create a 3D model of an object when a cortex model is combined with a motor signal and a sensory signal. The motor signal says what to explore at any point (like a motor signal to move the eye to focus on a particular point on a sculpture that you are looking at, and a sensory signal that says what the exploration is finding (the resulting signal from the new position where the eye is focusing). In this case a 3D model of the sculpture (in object-centered coordinates) will be built by the cortex. Hawkins says the representation is not like an image, but more like a 3D CAD model (CAD is computer-aided design). So I asked him (in a forum) if a hierarchy of elements could also be represented, as they can in “Vector Symbolic Architectures.” This is what he wrote:.
Our new work on sensory-motor inference does move closer to your “composite” objects goal. Basically, in our model, objects are defined as a set of features at different locations on the object. The “features” are just SDRs and could in theory represent anything, such as another object.
So far we have been modeling one or more columns in a single region, that is, no hierarchy. In these models the only “features” that can be associated with a location are pure sensory features. I think we would need to have at least two levels in the hierarchy to achieve a composite object as you envision them to be. But the mechanism supports compositional objects.
So there are likely areas for additional discoveries that will come out of Numenta’s model. As for Eliasmith’s model, its safe to say that more discoveries will come out of their model too. For instance, one could attempt to study neural disorders (and possibly mental disorders), by altering the behavior of their neurons. They model a part of the limbic system (the basal ganglia) that, when it goes wrong, can lead to motivation disorders such as addiction, and when it works correctly can decide what to pay attention to – so (my guess is) this could lead eventually to a computer modeling a “train of thought”.
Both projects are worth exploring, and what’s great about the internet is the software is free and that you can participate. They have forums which they monitor, for instance if you are interested specifically in the sensory-motor advance of Numenta, you can ask questions at: https://discourse.numenta.org and if you are interested in the Neural Engineering Framework, you can ask questions at: https://forum.nengo.ai/.
That’s all for this post, but for people interested in side-details: this is how both models handle words and language as inputs:
Both models have to be attached to a source of inputs, that end up as a series of numbers (or vectors). In fact, it could be argued our brain receives ALL information about the world as a series of numbers – via our vision, hearing, and so forth.
Vectors for words do not have to be arbitrary strings of on/off bits. One way to make them less arbitrary is to use a method called “Latent Semantic Analysis” that analyzes many documents, and finds that certain words occur together in certain types of documents, and so creates vectors for each word that preserve relationships. For instance, the vector for “Alzheimer’s” would be similar to the vector for “Plaque” because the two words tend to occur in the same documents. Eliasmith and Peter Blouw extended that idea to include some context information in the vector (such as position in the sentences where these words are encountered).
In the human brain, it was found with MRI that concepts really are organized semantically (and often physically). Here is one semantic map (you can see gallantlab.org/semanticmovies for more details):
Numenta’s software also needs inputs that are organized by meaning in some way. Even if you give it a series of numbers, you should encode those numbers as a series of bits so that the number ‘5’ is closer to ‘6’ than it is to ‘8’. For language inputs, they use a company (cortical.io) that makes a “semantic map” by looking at documents. Each word actually has coordinates in the map, and that means it can be represented by numbers. And they can handle sentences too: a vector for a sentence just adds up the vectors for those words. The summed vector doesn’t get too clogged with ‘on’ bits, because they force a limit arbitrarily on the number of bits that can be on. This can be done because often only a few bits are enough to recognize (or at least be reasonably confident of) the entire representation. So getting rid of bits is not as problematic as it might sound. A much better explanation than I can give is at: https://www.youtube.com/watch?v=HLuRQKzYbb8. And there are some interesting applications of their software mentioned on their website.