The understanding about dropout in DNN

Question

From what I understand about DNN's dropout regularization is that:

Dropout:

First we randomly delete neurons from the DNN and leave only the input and output the same. Then we perform forward propagation and backward propagation based on a mini-batch; learn the gradient for this mini-batch and then update the weights and biases – Here I denote these updated weights and biases as Updated_Set_1.

Then, we restore the DNN to default state and randomly delete the neurons. Now we perform the forward and backward propagation and find a new set of weights and biases called Updated_Set_2. This process continues until Updated_Set_N ~ N represents the number of mini batches.

Lastly, we calculate the average of all weights and biases based on the total Updated_Set_N; example, from Updated_Set_1 ~ Updated_Set_N. These new average weights and biases will be used to predict the new input.

I would just want to confirm whether my understanding is correct or wrong. If wrong, please do share me your thoughts and teach me. thank you in advance.

No, there is no weight averaging in Dropout, neurons are not really "deleted" but activations are masked. — Dr. Snoopy, Dec 26 '18 at 12:31

score 0 · Answer 1 · answered Dec 26 '18 at 12:59

Well, actually there is no averaging. During training, for every feed forward/back forward pass, we randomly "mute"/deactivate some neurons, so that their outputs and related weights are not considered during computation of the output neither during back propagation.

That means we are forcing the other activated neurons to give good prediction without the help of the deactivated neurons. So this increase their independency to the other neurons(features) and in the same way increase the model generalization.

Other than this the forward and back propagation phase are the same without dropout.

The understanding about dropout in DNN

1 Answers1