0

I am new to deep learning and trying to understand the concept behind hidden layers, but i am not clear with following things:

If there are suppose 3 hidden layers. When we take output from all the nodes of 2nd layer as input to all the nodes of 3rd layer then what difference it makes in output of nodes of 3rd layer as they are getting same input + same parameters initialization (as per what I read, I assume that all the nodes of one layer gets same random weight for parameters).

Please correct me if I am thinking in wrong direction.

1 Answers1

0

The simple answer is because of random initialization.

If you started with same weights through out the neural network (NN), then all nodes will produce the same output.

This is because when using backprop algorithm the error is spread out based on the activation strength of each node. If they start out the same then the error will spread equally and hence the nodes in the NN will not be able to learn different features.

So basic random initialization makes sure that each node specializes. Hence after learning, the nodes in the hidden layers will produce different outputs even when the input is the same.

Hope this helps.

Raj Ratn
  • 109
  • 1
  • 5