Are there alternatives to backpropagation?

Question

I know a neural network can be trained using gradient descent and I understand how it works.

Recently, I stumbled upon other training algorithms: conjugate gradient and quasi-Newton algorithms. I tried to understand how they work but the only good intuition I could get is that they use higher order derivative.

Are those alternative algorithms I mentioned fundamentally different from a backpropagation process where weights are adjusted by using the gradient of the loss function?

If not, is there an algorithm to train a neural network that is fundamentally different from the mechanism of backpropagation?

Imho backpropagation is not a learning algorithm. Its a gradient calculation algorithm. Learning is usually done by stochastic gradient then. But you could also do bfgs and co. Of course you could also adjust weights by genetic algorithms and such, without real gradients — sascha, Mar 21 '19 at 18:36

score 11 · Accepted Answer · answered Mar 22 '19 at 05:45

11

Conjugate gradient and quasi-Newton algorithms are still gradient descent algorithms. Backpropagation (or backprop) is nothing more than a fancy name to a gradient computation.

However, the original question of alternatives to backprop is very important. One of the recent alternatives, for example, is equilibrium propagation (or shortly eqprop).

answered Mar 22 '19 at 05:45

penkovsky

893
11
14

2

Another alternative to backpropagation is called Feedback Alignment https://arxiv.org/abs/1609.01596 – penkovsky Apr 25 '20 at 10:32

score 7 · Answer 2 · answered Mar 09 '20 at 23:00

7

Neuroevolution of augmenting topologies or NEAT is another way to learn the topology of the network and weights/biases of the network using the genetic algorithm.

answered Mar 09 '20 at 23:00

Frank Liu

1,466
3
23
36

1

simulated annealing would be better I guess. But random search is expensive whatever the form of it you take. Genetic algorithms is a twist of Local Beam Search method. We have to note though that encoding the weights as a single function is a clever idea and may work when we have enough computational power, like when FPGAs go mainstream – Nulik Apr 09 '20 at 09:51

score 5 · Answer 3 · edited Jun 18 '23 at 00:08

Consider reading this medium article on alternatives of backpropagation

https://link.medium.com/wMZABTTUbwb

Difference Target Propagation
The HSIC Bottleneck (Hilbert-Schmidt Independence Criterion)
Online Alternating Minimization with Auxiliary Variables
Decoupled Neural Interfaces Using Synthetic Gradients

I would want to add Monte Carlo based methods: https://arxiv.org/abs/2205.07408

score 1 · Answer 4 · answered Apr 12 '23 at 18:45

Now, there is a new algorithm, pioneered by Hinton (who recommended Backprop in the first place). It is called the Forward-Forward algorithm, which I won't explain fully here, but I'll leave some explanations at the end.

In essence, during training, it extracts some abstract features from each sample by training each layer individually to return high values for "positive" data (real samples) and low values for "negative data" (synthetic, garbled samples). Then these features are fed into a final Softmax layer that predicts classes, or a linear layer for regression.

This is advantageous over Backprop on smaller processors that can't handle the memory intensity of some large models (e.g. allowing you to train really big models on a regular PC).

Some good descriptions are here:

https://www.youtube.com/watch?v=rVzDRfO2sgs&ab_channel=EdanMeyer https://www.youtube.com/watch?v=F7wd4wQyPd8&ab_channel=EscVM

And the original paper is:

https://www.cs.toronto.edu/~hinton/FFA13.pdf

Are there alternatives to backpropagation?

4 Answers4