11

I know a neural network can be trained using gradient descent and I understand how it works.

Recently, I stumbled upon other training algorithms: conjugate gradient and quasi-Newton algorithms. I tried to understand how they work but the only good intuition I could get is that they use higher order derivative.

Are those alternative algorithms I mentioned fundamentally different from a backpropagation process where weights are adjusted by using the gradient of the loss function?

If not, is there an algorithm to train a neural network that is fundamentally different from the mechanism of backpropagation?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Nope
  • 153
  • 1
  • 6
  • 3
    Imho backpropagation is not a learning algorithm. Its a gradient calculation algorithm. Learning is usually done by stochastic gradient then. But you could also do bfgs and co. Of course you could also adjust weights by genetic algorithms and such, without real gradients – sascha Mar 21 '19 at 18:36

4 Answers4

11

Conjugate gradient and quasi-Newton algorithms are still gradient descent algorithms. Backpropagation (or backprop) is nothing more than a fancy name to a gradient computation.

However, the original question of alternatives to backprop is very important. One of the recent alternatives, for example, is equilibrium propagation (or shortly eqprop).

penkovsky
  • 893
  • 11
  • 14
7

Neuroevolution of augmenting topologies or NEAT is another way to learn the topology of the network and weights/biases of the network using the genetic algorithm.

Frank Liu
  • 1,466
  • 3
  • 23
  • 36
  • 1
    simulated annealing would be better I guess. But random search is expensive whatever the form of it you take. Genetic algorithms is a twist of Local Beam Search method. We have to note though that encoding the weights as a single function is a clever idea and may work when we have enough computational power, like when FPGAs go mainstream – Nulik Apr 09 '20 at 09:51
5

Consider reading this medium article on alternatives of backpropagation

https://link.medium.com/wMZABTTUbwb

  1. Difference Target Propagation

  2. The HSIC Bottleneck (Hilbert-Schmidt Independence Criterion)

  3. Online Alternating Minimization with Auxiliary Variables

  4. Decoupled Neural Interfaces Using Synthetic Gradients

I would want to add Monte Carlo based methods: https://arxiv.org/abs/2205.07408

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Ggjj11
  • 161
  • 1
  • 3
1

Now, there is a new algorithm, pioneered by Hinton (who recommended Backprop in the first place). It is called the Forward-Forward algorithm, which I won't explain fully here, but I'll leave some explanations at the end.

In essence, during training, it extracts some abstract features from each sample by training each layer individually to return high values for "positive" data (real samples) and low values for "negative data" (synthetic, garbled samples). Then these features are fed into a final Softmax layer that predicts classes, or a linear layer for regression.

This is advantageous over Backprop on smaller processors that can't handle the memory intensity of some large models (e.g. allowing you to train really big models on a regular PC).

Some good descriptions are here:

https://www.youtube.com/watch?v=rVzDRfO2sgs&ab_channel=EdanMeyer https://www.youtube.com/watch?v=F7wd4wQyPd8&ab_channel=EscVM

And the original paper is:

https://www.cs.toronto.edu/~hinton/FFA13.pdf

SamTheProgrammer
  • 1,051
  • 1
  • 10
  • 28