I'm taking a look at Keras to try to dive into deep learning.
From what I know, stacking just a few dense layers effectively stops back propagation from working due to vanishing gradient problem.
I found out that there is a pre-trained VGG-16 neural network you can download and build on top of it.
This network has 16 layers so I guess, this is the territory where you hit the vanishing gradient problem.
Suppose I wanted to train the network myself in Keras. How should I do it? Should I divide the layers into clusters and train them independently as autoecoders and than stack a classifier on top of it and train it? Is there a built-in mechanism for it in Keras?