Sorry for the late response, but the answer from Celius is not quite correct.
The training parameter of the Dropout
Layer (and for the BatchNormalization
layer as well) defines whether this layer should behave in training or inference mode. You can read this in the official documentation.
However, the documentation is a bit unclear on how this affects the execution of your network. Setting training=False
does not mean that the Dropout layer is not part of your network. It is by no means ignored as Celius explained, but it just behaves in inference mode. For Dropout, this means that no dropout will be applied. For BN, it means that BN will use the parameters estimated during training instead of computing new parameters for every mini-batch. This is really. The other way around, if you set training=True
, the layer will behave in training mode and dropout will be applied.
Now to your question: The behavior of your network does not make sense. If dropout was applied on unseen data, there is nothing to learn from that. You only throw away information, hence your results should be worse. But I think your problem is not related to the Dropout
layer anyway. Does your network also make use of BatchNormalization
layers? If BN is applied in a poor way, it can mess up your final results. But I haven't seen any code, so it is hard to fully answer your question as is.