Avoiding vanishing gradient in deep neural networks

Question

I'm taking a look at Keras to try to dive into deep learning.

From what I know, stacking just a few dense layers effectively stops back propagation from working due to vanishing gradient problem.

I found out that there is a pre-trained VGG-16 neural network you can download and build on top of it.

This network has 16 layers so I guess, this is the territory where you hit the vanishing gradient problem.

Suppose I wanted to train the network myself in Keras. How should I do it? Should I divide the layers into clusters and train them independently as autoecoders and than stack a classifier on top of it and train it? Is there a built-in mechanism for it in Keras?

score 4 · Accepted Answer · answered Sep 18 '17 at 04:28

No, the vanishing gradient problem is not as prevalent as before, as pretty much all networks (except recurrent ones) use ReLU activations which are considerably less prone to have this problem.

You should just train a network from scratch and see how it works. Do not try to deal with a problem that you don't have yet.

score 0 · Answer 2 · answered Mar 22 '20 at 19:45

Read about Skip connection . Although it is the Activation function those are responsible for this but skip connections make there inputs as well.

Skip connections introduced in the residual block allow the gradient to flow back and reach the initial layers.

we do not use Sigmoid and Tanh as Activation functions which causes vanishing Gradient Problems. Mostly nowadays we use RELU based activation functions in training a Deep Neural Network Model to avoid such complications and improve the accuracy.

Avoiding vanishing gradient in deep neural networks

2 Answers2

Linked