Keras model returns high validation accuracy while training, but accuracy is very low while evaluating

Question

I am trying to train a simple MobileNetV3Small under keras.applications as shown below

base_model = keras.applications.MobileNetV3Small(
        input_shape= INPUT_SHAPE,
        alpha=.125,
        include_top=False,
        classes=1,
        dropout_rate = 0.2,
        weights=None)

    x = keras.layers.Flatten()(base_model.output)
    preds = keras.layers.Dense(1, activation="sigmoid")(x)
    model = keras.Model(inputs=base_model.input, outputs=preds)

 model.compile(loss="binary_crossentropy",
                optimizer='RMSprop',
                metrics=["binary_accuracy"])

train_datagen = ImageDataGenerator(
        rescale=1.0 / 255,
        rotation_range=40,
        horizontal_flip=True,
        vertical_flip=True,
    )

    train_generator = train_datagen.flow_from_directory(
        os.path.join(DATA_ROOT, 'train'),
        target_size=(56,56),
        batch_size=128,
        class_mode="binary",
    )


    validation_datagen = ImageDataGenerator(rescale=1.0 / 255)
    validation_generator = validation_datagen.flow_from_directory(
        os.path.join(DATA_ROOT, 'val'),
        target_size=(56,56),
        batch_size=128,
        class_mode="binary",
    )

    model_checkpoint_callback = keras.callbacks.ModelCheckpoint(
        filepath=SAVE_DIR,
        save_weights_only=True,
        monitor='val_binary_accuracy',
        mode='max',
        save_best_only=True)

    es_callback = keras.callbacks.EarlyStopping(patience=10)

    model.fit(train_generator,
                epochs=100,
                validation_data=validation_generator,
                callbacks=[model_checkpoint_callback, es_callback],
                shuffle=True)

When I train the model I got validation accuracy around 0.94. But when I call model.evaluate on the exact same validation data, the accuracy becomes 0.48. When I call model.predict with any data it outputs constant value 0.51...

There is nothing wrong with learning rate, optimizer or metrics. What could be wrong here?

EDIT:

After training when I run

pred_results = model.evaluate(validation_generator)
print(pred_results)

it gives me the output for 1 epoch trained network:

6/6 [==============================] - 1s 100ms/step - loss: 0.6935 - binary_accuracy: 0.8461

However, when I save and load the model with either model.save() or tf.keras.models.save_model(). The output becomes something like this:

6/6 [==============================] - 2s 100ms/step - loss: 0.6935 - binary_accuracy: 0.5028 [0.6935192346572876, 0.5027709603309631]

and output of the model.predict(validation_generator) is:

[[0.5080832] [0.5080832] [0.5080832] [0.5080832] . . . [0.5080832] [0.5080832]]

What I've tried so far:

Used tf.keras.utils.image_dataset_from_directory() instead of ImageDataGenerator
Fixed tensorflow and numpy seeds globally.
Found similar problem in another SO post, and decreased momentum parameter of MobileNet BatchNormalization layers one by one.

for layer in model.layers[0].layers:
    if type(layer) is tf.keras.layers.BatchNormalization:
        layer.momentum = 0.9

First two moves do not have an effect, the after applying the third step, I get no longer same predictions for any input. However, evaluate() and predict() still have different accuracy values.

I tried to reproduce your problem with my own data, but I get low accuracy on the validation set during training. — elbe, Jun 27 '22 at 16:19
You need to load MobileNet with trained weights, i.e. set `weights= 'imagenet'`. — elbe, Jun 27 '22 at 16:43
But I cannot do it with very small alpha values, which accepts either 0.75 or 1.0. — Bhoke, Jun 27 '22 at 16:48
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/245951/discussion-between-elbe-and-bhoke). — elbe, Jun 27 '22 at 16:50
Some people believe that, after loading the model, that's better to compile the model again. Can you please do it and tell us the result? — Mojtaba Abdi Khassevan, Jun 27 '22 at 17:06

score 1 · Answer 1 · answered Jul 12 '22 at 17:36

Have you tried setting shuffle = False in validation_datagen.flow_from_directory()? It's a little misleading but the .flow_from_directory() method shuffles by default, which is problematic when generating your validation dataset. This is shuffling your validation data when you try to call .predict. Whereas in your training loop, the .fit method implicitly DOESN'T shuffle the validation set.

The reason I think this is the issue, is because you state that calling .predict() on the validation set nets you ~.5 accuracy, and you're also running a binary classification (sigmoid output with binary cross entropy loss), which makes perfect sense IF you're (mistakenly) shuffling your validation data. Untrained binary classifiers on balanced datasets will usually do around 50% accuracy (.5 for 0, .5 for 1) since it's just guessing at that point.

Source: I've built and trained a lot of image classification models before, and this happened to me a lot.

After I decreased momentum parameter, model starts to give different output. Please also see my comment under Mercury's answer. I am suspicious about shuffling too. However, I do not understand why shuffling affects when you import whole validation dataset? — Bhoke, Jul 12 '22 at 22:03
@Bhoke You're correct, in principle it shouldn't matter, but I think it matters here because you didn't set the seed argument in `validation_datagen.flow_from_directory()`. If I recall correctly, it will shuffle everytime you call the data unless you set the seed directly in that method. I know you specified that you set the seed in tensorflow and numpy globally, but I think you need to directly set the seed in this method because it's keras-specific. So try either setting `shuffle=False`, or leave it alone and set `seed=` some integer. — AndrewJaeyoung, Jul 13 '22 at 07:19

score 0 · Answer 2 · answered Jul 06 '22 at 10:55

0

It might be worth trying model.save_weights('directory') and then rebuilding your model (i think here that is re-running the base_model = ... code) through model.load_weights('directory'). That is what i do in my own models, and when i then do that, the accuracy/loss stay the exact same before and after saving and loading.

answered Jul 06 '22 at 10:55

Mercury

298
1
11

Already tried it. Maybe I should re-edit question to mention it. – Bhoke Jul 06 '22 at 11:25
Have you tried checking the weights are the same pre and post save? as in `if model.get_weights() == open("directory","r") print("same")`? And when you load the model, do you always get the same accuracy? E.g load it one time and get 0.5123, then delete the model, load it again, and the accuracy is the same. Because if not then it would seem like the training set is changing, or the weights are not in fact being loaded. – Mercury Jul 06 '22 at 11:48
1

Yes also checked it, there is no problem with weights. Decreasing `momentum` parameter of MobileNet seems to solve predict issue (I guess model could not learn with high momentum or it yield to dying ReLU problem). I think a fixed seed is also going to solve `evaluate()` and `predict()` inconsistency. – Bhoke Jul 06 '22 at 11:53
Seems like it would be the seed then, hope that that will solve it! – Mercury Jul 06 '22 at 11:56

score 0 · Answer 3 · answered Jul 09 '22 at 19:51

If you run pred_results = model.evaluate(validation_generator) after you fit the model, the loaded weights at this moment are the ones updated on last training epoch. What you have to do is after model.fit is loading the weights saved from model_checkpoint_callback with something like

model = model.load_weights(SAVE_DIR)` # and then .evaluate
pred_results = model.evaluate(validation_generator)
print(pred_results)

Keras model returns high validation accuracy while training, but accuracy is very low while evaluating

3 Answers3