In a previous post, I wrote about Numenta, a company that attempts to model just one region of the cortex. Since cortex looks very similar whether it is used for hearing, seeing, or other purposes, they believe there is a general basic algorithm that is used everywhere, and it makes sense to study the building block first.
Numenta makes its theory (Hawkins’ Hierarchical Temporal Memory – HTM) and algorithms available to the public.
In an article titled Symphony from Synapses: Neocortex as a Universal Dynamical Systems Modeller using Hierarchical Temporal Memory Fergal Byrne described how a region of HTM layers could form a module corresponding to a brain region. This region has feedback, in other words a signal from its output layer sends a branch to its input layer, and so the region becomes a “dynamical system”. Dynamical systems can exhibit behaviors such as chaos, and can have attractors. Attractors can be thought of as basins in a landscape, which once entered, cannot be exited. Some attractors are much more complicated than basins, but there still is a bounded area that a point traveling on that landscape can enter, but not leave.
One HTM layer learns synaptic connections between neurons firing at a time (t), and a time (t+1). So the neurons learn to predict what will happen next. In a real cortex, we find that some regions detect only simple features (such as edges, corners and so forth), and others, further up in a hierarchy, detect complex combinations of features such as faces. So in a lower region, it may help in prediction to get feedback from a higher region – for example if you know you are looking at a pyramid, then on a lower level that knowledge may help predict that if you view the pyramid from the top, that you will see a lines converging to a point.
So Fergal and others designed a new hierarchical machine, that they called the “Feynman Machine” and they formed a company they named Ogma (their website is: ogma.ai) to explore its potential. The Feynman machine did not use HTM’s model of regions, instead they used “K sparse autoencoders” as their building blocks but the general idea was similar.
I will give a very general explanation of what they did here.
But first lets go back to the article Symphony From Synapses, which was written before the Feynman machine was created, and which uses HTM regions in its initial theory. Here Fergal starts off by explaining that our world is a world of dynamical systems, systems that are constantly changing and that produce a changing flow of information at our eyes, ears and skin.
In mathematics, some such systems are modeled as having an output that feeds back into the input. For instance, if you were modeling the reproduction of rabbits, you might start out with 2 rabbits, that you calculate should be able to have 10 offspring, and now you feed in the number of rabbits (10) back into the equation to get perhaps 50 offspring in the next generation. You can plot the resulting rise in population, which does not go on forever, due to constraints such as limits on food, water, or shelter. In the diagram below (on the right) you see an equation that describes the growth of a population. You can see it levels off after some rapid growth.
Some dynamical systems are based on a few equations that depend on each other. Here is an example of a path (trajectory) by a “strange attractor”.
One characteristic of chaotic systems is that if you start your calculation with a number that is just slightly different than another, the paths they take eventually become very different. (In this picture, if you think of a ball sliding on a wire, the new ball’s start point might be slightly off the wire (on its own wire), and though it would travel for a while close to the other ball, eventually it would diverge dramatically).
The above picture shows 3 dimensions, but a Dutch mathematician named Floris Takens found that if you just sample one variable at intervals enough times, you capture the information relevant to you about the essential behaviors of the model.
For instance, sampling one variable (x) of this 3D chart can do this:
The bottom chart is not charting X,Y and Z, because all you have is X. It is actually plotting X against past versions of X in the series you obtained. And in important respects, it models the original.
So Fergal suggests that our brains, not having access to all variables that effect what we perceive, do this type of limited sampling, and from it reconstruct real-world dynamical systems.
One model in our brain can be coupled to another model, and influence it, and eventually via our motor actions, control real dynamical systems (like an arm pitching a baseball).
So the brain would be capturing rules, just like the rules that generated the strange attractor in the illustration, and not just learning a sequence such as links between still-frames of a movie.
One advantage of a model is that you can run a simulation forward in time to perform forecasting. If the simulation is incorrect, you can dismiss the difference as ‘noise’ or change the shape of the “landscape” (perhaps the equivalent of changing constants in your implicit equations).
HTM models the cortex as having only a small percent of its neurons firing at a time. This is a sparse distributed Representation (SDR) of whatever inputs are coming in.
As inputs change, the SDRs also change, so you can think of a sequence of SDRs in time. If the layer being modeled has 2048 neurons (which is typical in HTM implementations), it generally has 40 neurons on at any particular time (though the particular neurons that are on and off are constantly changing), and so we can think of SDRs being a single point traveling in a 2048 dimensional space. (For that point, all but 40 of the dimensions would have a value of zero at any time, but the others would be non-zero)
In the model below, the sensory inputs come into layer 4 (the layer closest to your skin is layer 1, and Layer 6 is deepest, but the sensory info doesn’t go to 1 or 6 directly.)
The illustration shows flows of inputs and control signals in Multilayer HTM. Sensory (red) inputs from thalamus and (blue) from lower regions flow to L4 and L6. L4 models inputs and transitions, L2/3 temporally pools over L4, passing its outputs up. L5 integrates top-down feedback, L2/3 outputs, and L6 control to produce motor output. L6 uses gating signals (orange) to co-ordinate and control inputs and outputs, and can execute simulation to run dynamics forward.
Layer 4 learns correlations between successive SDRs. The successive SDRs could be formed in response to you moving your eyes from point to point on an object until you recognize it. If SDR ‘a’ usually comes after SDR ‘q’, then links will strengthen between the neurons of the patterns, so that ‘a’ begins to be predicted when you experience ‘q’.
Then a subpopulation of neurons in Layer 2/3 of cortex performs a function known as Temporal Pooling, representing sets of successively predicted SDRs in Layer 4 as a single, stable output. For instance, if the SDRs coming into Layer 4 represent different observations by you of a chair from different angles, the representation in Layer 2/3 might stay stable – if so, it would represent the concept “chair” while of course the representation in Layer 4 keeps changing. So the Temporal Pooling SDR can be seen as a kind of dynamical symbol for the sequence of SDRs currently being traversed in L4. (You could also think of Layer 2/3 as learning the constants in the equations that are governing the behavior in Layer 4, though the constants are more like slow-changing variables).
L4 can be thought of as learning an attractor. L2/3 also is learning an attractor, and if, for instance, you are shifting your eyes suddenly from one object to another, the information coming from L4 to L2/3 is now so different that L2/3 changes its own representation. This can itself be viewed as shifting in L2/3 of its SDR trajectory from one basin of attraction to another. By definition, if a trajectory enters an attractor, it cannot leave, but if you change the governing constants of the implicit equation, you get a different path, with different attractors.
So now, finally, to the “Feynman Machine” and what its been used for so far:
Ogma Feynman Machine is a hierarchy of nonlinear dynamical systems. There is plenty of feedback, so in each region outputs influence inputs, and also higher regions send feedbacks to lower regions. Outputs that influence inputs are a feature of dynamic systems that we saw even in our simple example of the multiplying of rabbits.
It has been shown that a number of discrete and hybrid dynamical system designs can simulate any Turing Machine. A Turing machine is a very simple machine with simple rules, but it has the power of Universal Computation as does the laptop on your desk. So why use dynamical systems instead of regular computer algorithms?
Fergal gives an example as follows:
A soccer player who runs into the box to head a crossed ball into the net is clearly not solving the simultaneous differential equations of a spinning ball’s motion through moving air, under gravity, nor is his run the result of preplanning a sequence of torques generated by his muscles. The player’s brain has a network of dynamical systems models which have, through practice and experience, learned to predict the flight of the ball and plan a sequence of motor outputs which will, along with intermediate observational updates and corrections, lead to the desired performance of his skill.
In the Feynman machine, each building block is a paired decoder and encoder. A sensory input might be encoded, and then decoded back to a prediction of what it will be next, and the next signal coming in will be compared with the prediction. A diagram of their machine looks like this:
Each encoder/decoder pair is a non-linear dynamical system. The encoder represents its input as an SDR with a limited number of cells firing, and the decoder uses an algorithm to learn a prediction of the next input SDR (at time t+1), combining information from a signal coming down from higher up in the hierarchy, with the representation of the SDR. (The aspect of the output that feeds back into the input is the error signal between the decoder prediction for time (t+1) and the actual input pattern coming into the encoder.)
Ogma sees a use for its program in anomaly detection (which Numenta’s algorithm is also used for). If you follow a sequence, let’s say of credit card transactions, and then are surprised by an anomaly in the sequence that conflicts with predictions, you might be detecting a fraudulent transaction. Other applications of anomaly detection include stroke prediction, monitoring of dementia sufferers, monitoring of industrial plant and equipment, energy production, vehicle maintenance, counterterrorism, etc.
Ogma has also hooked up their architecture with deep learning modules, and even to a radio controlled self-driving car model that has a video camera attached. You teach the car by guiding it down a few paths:
Its not easy to understand what is actually going on in dynamical system in the brain. The connections between neurons, and the signals going up and down are mysterious There has been some work in trying to understand the flow:
One paper, by Giovanni Carmantini et al (see sources) implements a Turing machine as a dynamical system. You can put in known rules of the Turing machine into the dynamic system. To quote the article: “the synaptic weight matrix (of the dynamical system) is explicitly designed from the machine table (of rules) of the encoded automaton”. and “the derived approach could bring about the exciting possibility of a symbolic read-out of a learned algorithm from the network weights.”
So, at least in their architecture of a dynamic system, you do know what is going on under the hood.
Another paper, this one by a neurobiologist named Dean Buonomano, shows that any set of neurons that include recurrent connections will have its own preferred trajectories, which mostly are chaotic, but he can force the neurons to learn one of their preferred trajectories to the point that they are not chaotic any more (for that trajectory). So if you have a sequence of firing patterns of the neurons, even if you perturb that sequence with noise, it will snap back to the sequence. This means that chaos has been tamed, and you don’t have the problem any more of points at very similar locations diverging dramatically. The trajectory of patterns of firing is now predictable, and the region can be used to write words on a screen, if its outputs are trained to do so. His paper is also interesting because he lists some techniques that are available to understand the internals of such systems of neurons. Most of those techniques are statistical such as looking at the distribution of weights, so while they show some significant changes after learning, you still don’t quite understand how the information is coded and how it is used.
But Ogma’s work is exciting, as is Numenta’s, and I’m sure we can expect some major advances by both companies. Both give you their source code, free, to experiment with.
Here is one last example of what Ogma’s software can do, the lower sequence is a learned sequence of prediction of video frames based on training on a video shown in the top sequence.
Feynman Machine: The Universal Dynamical Systems Computer by Eric Laukien, Richard Crowder, Fergal Byrne (https://arxiv.org/abs/1609.03971)
Symphony from Synapses: Neocortex as a Universal Dynamical Systems Modeler using Hierarchical Temporal Memory – by Fergal Byrne (https://arxiv.org/abs/1512.05245. Most of the pictures in this blog post come from that article)
Robust timing and motor patterns by taming chaos in recurrent neural networks by Rodrigo Laje & Dean V Buonomano (available on internet)
A modular architecture for transparent computation in recurrent neural networks by Giovanni S.Carmantini, Matheiu Desroches and Serafim Rodrigues (available on internet)