-1

I'm training a deep neural network with N hidden layers. But I found that both train and test accuracy got worse when N becomes large (which means more hidden layers)

As I know, when neural network becomes deeper, model's performance may become worse due to gradient vanish/explode, which means layers close to input layer can't update weights stably because of little/large value of grad.

But after I check each layer's grad, I found grad in deeper layer is smaller than shallow layer, e.g. 10^-5 in the first hidden layer and 10^-10 in the last hidden layer, which is different to my thought.

Is ther any misunderstanding to my thought? Or actually there is another reason due to the worse result when my model goes deeper? Thanks.

Nick Lin
  • 29
  • 4
  • It goes the other way, deeper layers get smaller gradients, which is exactly what your results show, it is not about shallow layer (close to input). – Dr. Snoopy Jul 11 '22 at 09:02

1 Answers1

0

When network got deeper and deeper, the data passed to the network got smaller and smaller (like the case in CNN with pooling layers) which has nothing to do with the gradient vanish but when you go backward, the gradient vanishing starts to appear. It is two sides phenomina actually

lofy
  • 35
  • 8
  • Thanks for reply but I'm confused to "data got smaller". Why did this happen? Any ways to solve this? – Nick Lin Jul 11 '22 at 06:00
  • What you mean is that each hidden layer's input become smaller when forward, which cause the gradient become smaller in deeper layer right? – Nick Lin Jul 11 '22 at 06:03
  • Take the case of a deep CNN, the basic structure is using convolution layers followed by pooling layers. Each layer can reduce the size of image passes through the network which reduces the amount of information. – lofy Jul 13 '22 at 15:55