3

I'm using dropout layers on my model implemented in tensorflow (tf.keras.layers.Dropout). I set the "training= True" during the training and "training=False" while testing. The performance is poor. I accidentally changed "training=True" during testing too, and the results got much better. I'm wondering what's happening? And why it is affecting the training loss values? Because I'm not making any changes to the training and the whole testing process happens after training. However, changing "training=True" in testing is affecting the training process and causing the training loss to get closer to zero and then the testing results are better. Any possible explanation?

Thanks,

khemedi
  • 774
  • 3
  • 9
  • 19
  • 1
    Possible duplicate of [What does \`training=True\` mean when calling a TensorFlow Keras model?](https://stackoverflow.com/questions/57320371/what-does-training-true-mean-when-calling-a-tensorflow-keras-model) – Celius Stingher Nov 14 '19 at 17:32
  • Well, the question you shared is about the argument itself. I'm asking more about the behavior that it is showing me. If I set it true during testing, it'll cause the loss values to get closer to 0 and higher testing accuracy! – khemedi Nov 14 '19 at 18:22
  • I believe the answer to the question provides the information needed to answer the question. I believe you are familiar with BatchNormalization and Dropout, hence these are regularization techniques which will cause to have a better performance during testing. This translates to higher accuracy and lower loss indeed. – Celius Stingher Nov 14 '19 at 18:24
  • So does that mean you can set "training=True" during both training and testing mode? I thought you MUST (or at least make sense) to set "training=True" during training and "training=False" during testing phase. – khemedi Nov 14 '19 at 18:27
  • Plus, I'm confused why when I set "training=True" during training and "training=False" during testing I get very poor results! – khemedi Nov 14 '19 at 18:28
  • 1
    BTW, I edited my question's title to make the difference between my question and the similar one you shared more clear. – khemedi Nov 14 '19 at 18:39

1 Answers1

8

Sorry for the late response, but the answer from Celius is not quite correct.

The training parameter of the Dropout Layer (and for the BatchNormalization layer as well) defines whether this layer should behave in training or inference mode. You can read this in the official documentation.

However, the documentation is a bit unclear on how this affects the execution of your network. Setting training=False does not mean that the Dropout layer is not part of your network. It is by no means ignored as Celius explained, but it just behaves in inference mode. For Dropout, this means that no dropout will be applied. For BN, it means that BN will use the parameters estimated during training instead of computing new parameters for every mini-batch. This is really. The other way around, if you set training=True, the layer will behave in training mode and dropout will be applied.

Now to your question: The behavior of your network does not make sense. If dropout was applied on unseen data, there is nothing to learn from that. You only throw away information, hence your results should be worse. But I think your problem is not related to the Dropout layer anyway. Does your network also make use of BatchNormalization layers? If BN is applied in a poor way, it can mess up your final results. But I haven't seen any code, so it is hard to fully answer your question as is.

PKlumpp
  • 4,913
  • 8
  • 36
  • 64