I am new to deep learning and trying to understand the concept behind hidden layers, but i am not clear with following things:
If there are suppose 3 hidden layers. When we take output from all the nodes of 2nd layer as input to all the nodes of 3rd layer then what difference it makes in output of nodes of 3rd layer as they are getting same input + same parameters initialization (as per what I read, I assume that all the nodes of one layer gets same random weight for parameters).
Please correct me if I am thinking in wrong direction.