I know a neural network can be trained using gradient descent and I understand how it works.
Recently, I stumbled upon other training algorithms: conjugate gradient and quasi-Newton algorithms. I tried to understand how they work but the only good intuition I could get is that they use higher order derivative.
Are those alternative algorithms I mentioned fundamentally different from a backpropagation process where weights are adjusted by using the gradient of the loss function?
If not, is there an algorithm to train a neural network that is fundamentally different from the mechanism of backpropagation?