Is it plausible to get a lower mse with more SpatialDropouts?

Question

I have made the observation that with a

SpatialDropout2D(0.2)

layer after each of 5 Convolutional2D layers, the training and validation error is much lower during the first few epochs than with the same network without these Dropout layers (all else equal). This seems counter-intuitive, since I would expect the optimization routine to have more trouble finding a minimum if intermediate results are dropped out randomly.

So is my observation plausible? And if so, why?

score 1 · Accepted Answer · answered Jan 30 '17 at 15:43

Generally speaking, dropout is one of the techniques that is used to combat overfitting. It is expected to reduce the test error but not the training one. On the contrary, training error might go up when the model stops being overfitted.

I recommend to read more about dropout in the Deep Learning textbook, section 7.12.

Is it plausible to get a lower mse with more SpatialDropouts?

1 Answers1