Learning weights between a pair of neurons: a long way past Hebb

While many artificial neural nets use a rule to update their weights that does not involve time, It has been found in the brain that Spike-timing-dependent plasticity (STDP) is common. Here, it is not enough to say that neuron A fires at the same time as neuron B and therefore increases the weight between them. For synapses between cortical or hippocampal pyramidal neurons, a presynaptic spike a few milliseconds before a postsynaptic one typically leads to long-term potentiation (LTP), which strengthens the weight whereas the reverse timing leads to depression, which weakens the weight. This way a weight only gets strengthened from an event earlier in time to an event later in time and not vice versa. But this is not the whole story either.
One anomaly occurs when triplets of neural spikes are applied, instead of just pairs.
Consider this triplet

post then pre then post
pre then post then pre

where (‘pre’ is the firing of a presynaptic neuron, and ‘post’ is the firing of a postsynaptic neuron)

Both triplets contain two transitions: (post, pre) and (pre, post)

The only difference is the order of the transitions.

They both should have the same effect, but pre-post-pre with the same timing difference leads to insignificant changes, whereas post-pre-post induces a strong potentiation of synapses in hippocampal cultures.

So Claudia Clopath and Wulfram Gerstner came up with a simple but better theory.

First some terminology. Let U be the instantaneous voltage in the postsynaptic neuron. Let U~ be the voltage over a longer time period in that neuron (a low pass time filter of depolarization will give you that.) You can think of U~ as a moving average of voltage over a longer time period.

They found that if the post synaptic neuron had been depolarized for some time, and subsequently a presynaptic spike occurs, the result is ‘depression’ of the connection. When they say depolarized, they mean depolarized above a threshold T-. T- is a lower threshold compared with T+, which comes into use in ‘potentiation’.
This sequence might happen if a postsynaptic spike happened recently, so the depolarization of the postsynaptic neuron had not completely descended to normal values, and then a presynaptic spike comes in.


Potentiation of the synapse occurs if the following three conditions are met simultaneously:

(i) The momentary postsynaptic voltage U is above a threshold T+ which is around the firing threshold of the neuron. (ii) The low-pass filtered voltage U~ is above T-. (iii) A presynaptic spike occurred a few milliseconds earlier and has left a “trace” x at the site of the synapse. The trace could represent the amount of glutamate bound at the postsynaptic receptor; or the percentage of NMDA receptors in an upregulated state or something similar.

Note that the postsynaptic neuron enters the picture twice. First, we need a spike to overcome the threshold T+ and second, the filtered membrane must be depolarized before the spike. This depolarization could be due to earlier action potentials which have left a depolarizing spike after-potential which explains the relevance of post-pre-post or pre-post-pre triplets of spikes or to sustained input at other synapses.

The model takes a weighted combination of both the influence on depression (by the spikes and longer term voltage), and the influence of the same measures on potentiation, and that combination predicts what will happen to the weights in the neuron. In other words, the model doesn’t assume that only one process happens at a time at a synapse, it assumes both happen to some extent, at the same time.

For plasticity experiments considered here, it is crucial to have a spike after depolarization in order to have a trace of the spike lasting for about 50 milliseconds

The model has some complications, for instance:

The plasticity model depends directly on the postsynaptic voltage at the synapse; depending on the location of the synapse along the dendrite, the time course of the voltage is expected to be different.

This means that when a post-synaptic spike occurs, and the depolarization spreads all over the neuron, both forward and backward to the dendrites, since some dendrites are further away than others from the initiating point of the spike their potential will be different and therefore the model might predict different weight changes in one area than another of the same neuron.

In another paper, Claudia Clopath and Jacopo Bono describe a different type of spike altogether. Not only do neurons have somatic spikes that travel down the axon and release neurotransmitter, but they have NMDA spikes – which occur most often in the distal parts of dendrites (the far ends of the dendrites away from the soma). (Somatic spikes are more easily triggered in the proximal parts of dendrites). A somatic spike is the firing of the neuron, an NMDA spike is a brief spike (usually) at a dendrite that cannot fire the neuron by itself. dLTP is the abbreviation used for potentiation in the dendrites. One important difference between learning caused by the two types of spikes is that the target neuron has to actually fire (an action potential) for learning to happen with somatic spikes, but the target neuron does not have to fire for learning to occur with NMDA spikes.

Clopath and Bono speculate on the implications:

For this purpose, we consider a memory which associates several features. Such an association should be robust when presenting only its components, since neurons taking part in multiple assemblies and various sources of noise can lead to an incomplete activation of the association. For example, imagine that we have learned the association “coffee”, with components such as the colour brown, its taste, its smell, etc. We do not forget to associate the colour brown with the item “coffee”, even though we experience that colour much more often than we experience coffee…

We studied how ongoing activity affects memory retention in networks. In the first network, we implemented four groups of neurons, which we take to represent four features that constitute one association. For example, with “coffee” one could associate the features “drink”; “colour brown”; “hot”; and “something you like” The network neurons are all-to-all connected and the connections are randomly distributed across distal and proximal compartments. Importantly, the distal connections coming from neurons of the same feature are always clustered on the same distal compartment post-synaptically. We simulated ongoing activity by randomly choosing a feature and activating it. Carrying on the previous example, we can imagine that we encounter a brown colour, a drink, something hot and the something you like at occasions other than when thinking about coffee. Since the neurons from these different features are never activated together, proximal weights between different features are weakened.

In other words, the various attributes of coffee don’t always go together. Sometimes the color brown might fire when viewing soil, or chocolate. So a target neuron for ‘coffee’ might sometimes get an input from ‘brown’ when the other attributes of ‘coffee’ are not present. The neuron won’t have enough inputs to fire, and this means the link from ‘brown’ should weaken.

The authors continue:

However, the active features always stimulate distally projecting clustered synapses, and NMDA spikes will be evoked more easily. As a result, we find that the distal weights between neurons of different features do not depress substantially compared to the proximal weights….[we] explored how the learning and re-learning of such associations affect each other. We divided a network into two associative memories, for example one “chocolate” and the other “coffee”. Each consists of 4 groups of neurons, representing different features of the association. Both chocolate and coffee share colour and “something you like” features while having two unshared features each…
Our simulations suggest that dLTP allows a subset of strengthened weights to be maintained for a longer time compared to STDP. Due to this mechanism, a trace of a previously learned memory can remain present even when the memory has not been activated for a long time. dLTP protects the weights from being weakened by ongoing activity, while synapses unable to evoke dLTP are depressed.

The authors also speculate that the NMDA spikes, which have a depolarizing affect on the Soma, make it easier for normally weak inputs to trigger a somatic spike. So the NMDA spikes act as a ‘teacher’ that gates an input.

At minimum, this research shows that the learning even in a single neuron is more complicated than had been thought.

To make things even more complicated, there is a new theory from a group in Israel that the synaptic weight is not the same as a ‘dendrite’ weight, so you could have two or three synapses merging onto a dendrite segment, and the dendrite segment would have its own separate weight that also learns.


In their paper, the dendrite learns using the same rule as the synapses, an STDP rule that uses some of the same information (such as the depolarization of the target neuron) and the results are weights that do not end up at extremes of high or low values, but which can stabilize at intermediate values (though those values can oscillate.)  Nobody has yet found oscillations in dendrites, but they would be difficult to find.

It is likely that neural nets of the future that are based on current biology for inspiration will have units that are more complex than today’s. In fact of the authors of the dendrite learning paper say that so many different firing patterns are created by their architecture that “notions like capacity of a network, capacity per weight and generalization have to be redefined” and that should include “the possible number of oscillatory attractors for the weights… [with] their implication on advanced deep learning algorithms”


Voltage and spike timing interact in STDP – a unified model by Claudia Clopath and Wulfram Gerstner
Modeling somatic and dendritic spike mediated plasticity at the single neuron and network level  by Jacopo Bono & Claudia Clopath
OPEN Adaptive nodes enrich nonlinear cooperative learning beyond traditional adaptation by links by  Shira Sardi, Roni Vardi Amir Goldental, Anton Sheinin, Herut Uzan & Ido Kanter

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s