Getting different accuracy on test data in MNIST digit recognition in Keras

Question

I am doing handwritten digit recognition using Keras and I have two files: predict.py and train.py.

train.py trains the model (if it is not already trained) and saves it to a directory, otherwise it would just load the trained model from the directory it was saved to and prints the Test Loss and Test Accuracy.

def getData():
    (X_train, y_train), (X_test, y_test) = mnist.load_data()
    y_train = to_categorical(y_train, num_classes=10)
    y_test = to_categorical(y_test, num_classes=10)
    X_train = X_train.reshape(X_train.shape[0], 784)
    X_test = X_test.reshape(X_test.shape[0], 784)
    
    # normalizing the data to help with the training
    X_train /= 255
    X_test /= 255
    
 
    return X_train, y_train, X_test, y_test

def trainModel(X_train, y_train, X_test, y_test):
    # training parameters
    batch_size = 1
    epochs = 10
    # create model and add layers
    model = Sequential()    
    model.add(Dense(64, activation='relu', input_shape=(784,)))
    model.add(Dense(10, activation = 'softmax'))

  
    # compiling the sequential model
    model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
    # training the model and saving metrics in history
    history = model.fit(X_train, y_train,
          batch_size=batch_size, epochs=epochs,
          verbose=2,
          validation_data=(X_test, y_test))

    loss_and_metrics = model.evaluate(X_test, y_test, verbose=2)
    print("Test Loss", loss_and_metrics[0])
    print("Test Accuracy", loss_and_metrics[1])
    
    # Save model structure and weights
    model_json = model.to_json()
    with open('model.json', 'w') as json_file:
        json_file.write(model_json)
    model.save_weights('mnist_model.h5')
    return model

def loadModel():
    json_file = open('model.json', 'r')
    model_json = json_file.read()
    json_file.close()
    model = model_from_json(model_json)
    model.load_weights("mnist_model.h5")
    return model

X_train, y_train, X_test, y_test = getData()

if(not os.path.exists('mnist_model.h5')):
    model = trainModel(X_train, y_train, X_test, y_test)
    print('trained model')
    print(model.summary())
else:
    model = loadModel()
    print('loaded model')
    print(model.summary())
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    loss_and_metrics = model.evaluate(X_test, y_test, verbose=2)
    print("Test Loss", loss_and_metrics[0])
    print("Test Accuracy", loss_and_metrics[1])

Here is the output (assuming model was trained earlier and this time model will just be loaded):

('Test Loss', 1.741784990310669)

('Test Accuracy', 0.414)

predict.py, on the other hand, predicts a handwritten number:

def loadModel():
    json_file = open('model.json', 'r')
    model_json = json_file.read()
    json_file.close()
    model = model_from_json(model_json)
    model.load_weights("mnist_model.h5")
    return model

model = loadModel()

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

(X_train, y_train), (X_test, y_test) = mnist.load_data()
y_test = to_categorical(y_test, num_classes=10)
X_test = X_test.reshape(X_test.shape[0], 28*28)


loss_and_metrics = model.evaluate(X_test, y_test, verbose=2)

print("Test Loss", loss_and_metrics[0])
print("Test Accuracy", loss_and_metrics[1])

In this case, to my surprise, getting the following result:

('Test Loss', 1.8380377866744995)

('Test Accuracy', 0.8856)

In the second file, I am getting a Test Accuracy of 0.88 (more than double that I was getting before).

Also, model.summery() is the same in both of the files:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 64)                50240     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                650       
=================================================================
Total params: 50,890
Trainable params: 50,890
Non-trainable params: 0
_________________________________________________________________

I can't figure out the reason behind this behavior. Is it normal? Or am I missing something?

Didn't you perform any preprocessing before training the model? — today, Nov 08 '18 at 13:27
I did. Edited my question(i have now included the complete file) — ray an, Nov 08 '18 at 13:41

today · Accepted Answer · 2018-11-08T14:00:40.430

The discrepancy results from the fact that one time you are calling evaluate() method with normalized data (i.e. divided by 255) and the other time (i.e. in "predict.py" file) you are calling it with un-normalized data. In inference time (i.e. test time) you should always use the same pre-processing step you have used for the training data.

Further, first convert the data to floating point and then divide it by 255 (otherwise, with /, a true division is done in Python 2.x and in Python 3.x you would get errors when running X_train /= 255 and X_test /= 255):

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

X_train /= 255.
X_test /= 255.

Getting different accuracy on test data in MNIST digit recognition in Keras

1 Answers1