Should I use evaluate_generator or evaluate to evaluate my CNN model

Question

I am implementing a CNN using keras to perform image classification and I had used .fit_generator() method to train the model till a stop condition is verified I used the next code:

history_3conv = cnn3.fit_generator(train_data,steps_per_epoch = train_data.n // 98, callbacks = [es,ckpt_3Conv], 
    validation_data = valid_data, validation_steps = valid_data.n // 98,epochs=50)

The last two epochs before stopping were the next :

As it is shown the last training accuracy was 0.91. However, when I use model.evaluate() method to evaluate training, testing and validation sets I got the next result:

So, my question is: Why I got two different values?

Should I use evaluate_generator() ? or should I fix seed in flow_from_directory() knowing that to perform data augmentation I used the next code:

trdata = ImageDataGenerator(rotation_range=90,horizontal_flip=True)
vldata = ImageDataGenerator()
train_data = trdata.flow(x_train,y_train,batch_size=98)
valid_data = vldata.flow(x_valid,y_valid,batch_size=98)

In addition, I know that setting use_multiprocessing=False in fit_generator will cost me slowing down training significantly. So what do you think could be the best solution

What is your patience for early stopping? Are you saving only the best weights or the weights of the very last epoch? Generally if you evaluate your model, you should not use any augmentation on the test data. The same goes for validation data used for early stopping. — Tinu, Sep 01 '20 at 09:14
I am only saving the best weights each time the val_loss is ameliorated : here is my stop conditions: monitor='val_loss', patience=7 — baddy, Sep 01 '20 at 09:20
In this case your validation loss / accuracy for evaluation should be equal to the validation loss / accuracy from the epoch where you saved the weights. Notice: If you augmented your validation data in every epoch, this might not be the case anymore. — Tinu, Sep 01 '20 at 09:52
the train_data provided in the fit_generator is an augmented data so normally I did the augmentation for each epoch. Also, I wanted to mention that validation loss / accuracy for evaluation is not equal to the validation loss / accuracy from the epoch where I saved the weights — baddy, Sep 01 '20 at 09:54
@baddy, As per the TF Documentation, https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit_generator, `fit_generator` and `evaluate_generator` are deprecated, can you try with `fit` and `evaluate` and check how it goes? Thanks — , Sep 04 '20 at 08:08

score 7 · Accepted Answer · answered Sep 18 '20 at 07:10

model.fit() and model.evaluate() are the way to go as model.fit_generator and model.evaluate_generatorare deprecated.

The training and validation data are augmented data produced by the generator. So you will have a bit of variation in the accuracy. If you have used non-augmented validation or test data in the validation_data of fit_generator and also for model.evaluate() or model.evaluate_generator, then there wouldn't be any change in the accuracy.

Below is the simple Cat and Dog Classification program that I have ran for one epoch-

Validation data generator has just rescale transformation and no other augmentation techniques.
Validation accuracy is displayed after end of the epoch.
Reset the Validation data generator using val_data_gen.reset(). Shouldn't be necessary though as we have not done any augmentations.
Evaluate the validation data accuracy using model.evaluate and as well as model.evaluate_generator.

The validation accuracy computed after end of the epoch and accuracy computed using model.evaluate and model.evaluate_generator are matching.

Code:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam

import os
import numpy as np
import matplotlib.pyplot as plt

_URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'

path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=_URL, extract=True)

PATH = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')

train_dir = os.path.join(PATH, 'train')
validation_dir = os.path.join(PATH, 'validation')

train_cats_dir = os.path.join(train_dir, 'cats')  # directory with our training cat pictures
train_dogs_dir = os.path.join(train_dir, 'dogs')  # directory with our training dog pictures
validation_cats_dir = os.path.join(validation_dir, 'cats')  # directory with our validation cat pictures
validation_dogs_dir = os.path.join(validation_dir, 'dogs')  # directory with our validation dog pictures

num_cats_tr = len(os.listdir(train_cats_dir))
num_dogs_tr = len(os.listdir(train_dogs_dir))

num_cats_val = len(os.listdir(validation_cats_dir))
num_dogs_val = len(os.listdir(validation_dogs_dir))

total_train = num_cats_tr + num_dogs_tr
total_val = num_cats_val + num_dogs_val

batch_size = 1
epochs = 1
IMG_HEIGHT = 150
IMG_WIDTH = 150

train_image_generator = ImageDataGenerator(rescale=1./255,brightness_range=[0.5,1.5]) # Generator for our training data
validation_image_generator = ImageDataGenerator(rescale=1./255) # Generator for our validation data

train_data_gen = train_image_generator.flow_from_directory(batch_size=batch_size,
                                                           directory=train_dir,
                                                           shuffle=True,
                                                           target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                           class_mode='binary')

val_data_gen = validation_image_generator.flow_from_directory(batch_size=batch_size,
                                                              directory=validation_dir,
                                                              target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                              class_mode='binary')

model = Sequential([
    Conv2D(16, 3, padding='same', activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
    MaxPooling2D(),
    Conv2D(32, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Conv2D(64, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Flatten(),
    Dense(512, activation='relu'),
    Dense(1)
])

optimizer = 'SGD'

model.compile(optimizer=optimizer, 
          loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
          metrics=['accuracy'])

history = model.fit_generator(
          train_data_gen,
          steps_per_epoch=total_train // batch_size,
          epochs=epochs,
          validation_data=val_data_gen,
          validation_steps=total_val // batch_size)


from sklearn.metrics import confusion_matrix

# Reset 
val_data_gen.reset()

# Evaluate on Validation data
scores = model.evaluate(val_data_gen)
print("%s%s: %.2f%%" % ("evaluate ",model.metrics_names[1], scores[1]*100))

scores = model.evaluate_generator(val_data_gen)
print("%s%s: %.2f%%" % ("evaluate_generator ",model.metrics_names[1], scores[1]*100))

Output:

Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
2000/2000 [==============================] - 74s 37ms/step - loss: 0.6932 - accuracy: 0.5025 - val_loss: 0.6815 - val_accuracy: 0.5000
1000/1000 [==============================] - 11s 11ms/step - loss: 0.6815 - accuracy: 0.5000
evaluate accuracy: 50.00%
evaluate_generator accuracy: 50.00%

Should I use evaluate_generator or evaluate to evaluate my CNN model

1 Answers1

Linked