Why does my model perform good on test dataset but very bad on real data?

Question

I am writing a character recognizing CNN. I have used EMNIST Dataset.

Kaggle Notebook link : https://www.kaggle.com/code/notshrirang/ocr-with-cnn
GitHub Notebook link : https://github.com/NotShrirang/MyOCR

My model does pretty good on testing dataset. But when I use the image I captured with my phone, it never predict correctly.
What to do? Please help.

Here is my code snippet for model architecture:

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(kernel_size=(8, 8),filters=128, input_shape=(28, 28, 1), activation="relu"),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.AveragePooling2D(pool_size=(2, 2)),
    tf.keras.layers.Conv2D(kernel_size=(4, 4), filters=64, activation="relu"),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.AveragePooling2D(pool_size=(2, 2)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(27, activation='softmax')
])

model.compile(
        optimizer=tf.keras.optimizers.SGD(),
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
        metrics=['accuracy']
)

model.fit(X_train, y_train, epochs=8, validation_split = 0.2)

Output While training model:

Loss vs Val Loss Plot:

Evaluation:

evaluation = new_model.evaluate(X_test, y_test)
evaluation

I resized the images to (28x28).
Also, all images in training, testing and real data are converted to grayscale.
The dataset has all images with their matrix transposed. I have straightened them.
I tried normalizing the data, but it decreased the val_accuracy so I stopped normalizing.
I tried shuffling the data. It increased the val_accuracy so I kept it.
I tried increasing and decreasing layers and epoch in model. It was of no use except change in training time.
I added batch normalization. I increased time required to train model.

A major reason why the model might be not performing well can be because of huge differences in training, validation, and testing data which are sampled from EMNIST data, while your real world images are captured from your phone camera. The difference between the images might be quite a lot, try visualising the two images after performing all the required pre-processing steps. — Azhan Mohammed, Nov 28 '22 at 06:47
The MNIST-like datasets are not meant to be used with real data, they are academic datasets. — Dr. Snoopy, Nov 28 '22 at 09:09

score -2 · Answer 1 · edited Mar 17 '23 at 13:46

there are many techniques not only the model

Increase the size of data, which also creates significant inputs.
Use features extraction functions and preprocessing of data, MFCC, Furriers, and Data input transforming ( blur, rotates, flipped, paddings, zooms, or random noises ).
Create multiple sets of data inputs and training K-folded validation.
Random data selection, saved and load model or performance callbacks parameter adjusting.
Compares between models or model concatenated.

Sample: My image recognitions templates, useful when performing functions and conversations for solutions.

import os
from os.path import exists

import tensorflow as tf
import tensorflow_io as tfio

import pandas as pd

import matplotlib.pyplot as plt

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
None
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
config = tf.config.experimental.set_memory_growth(physical_devices[0], True)
print(physical_devices)
print(config)

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Variables
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
variables = pd.read_excel('F:\\temp\\Python\\excel\\Book 13 (2) (3).xlsx', index_col=None, header=[0])

list_label = [ ]
list_Image = [ ]
list_file_actual = [ ]
list_label_actual = [ 'Candidt Kibt', 'Candidt Kibt', 'Candidt Kibt', 'Candidt Kibt', 'Candidt Kibt', 'Pikaploy', 'Pikaploy', 'Pikaploy', 'Pikaploy', 'Pikaploy' ]

for Index, Image, Label in variables.values:
    print( Label )
    # list_label.append( Label )
    
    image = tf.io.read_file( Image )
    image = tf.io.decode_image(image)
    list_file_actual.append(image)
    image = tf.image.resize(image, [32,32], method='nearest')
    list_Image.append(image)
    
    if Label == 0:
        list_label.append(0)
    else:
        list_label.append(9)
    
    # if Label == 0:
        # list_label_actual.append('Candidt Kibt')
    # else:
        # list_label_actual.append('Pikaploy')


list_label = tf.cast( list_label, dtype=tf.int32 )
list_label = tf.constant( list_label, shape=( 54, 1, 1 ) )
list_Image = tf.cast( list_Image, dtype=tf.int32 )
list_Image = tf.constant( list_Image, shape=( 54, 1, 32, 32, 3 ) )

# print( list_label_actual )
# print( list_label )

checkpoint_path = "F:\\models\\checkpoint\\" + os.path.basename(__file__).split('.')[0] + "\\TF_DataSets_01.h5"
checkpoint_dir = os.path.dirname(checkpoint_path)
loggings = "F:\\models\\checkpoint\\" + os.path.basename(__file__).split('.')[0] + "\\loggings.log"

if not exists(checkpoint_dir) : 
    os.mkdir(checkpoint_dir)
    print("Create directory: " + checkpoint_dir)
    
log_dir = checkpoint_dir

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: DataSet
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
dataset = tf.data.Dataset.from_tensor_slices(( list_Image, list_label ))
list_Image = tf.constant( list_Image, shape=( 54, 32, 32, 3) ).numpy()

print( "===========================================" )
print( "type of variables: " )
print( type(variables) )
print( variables )
print( "variables.values: " )
print( variables.values )

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model = tf.keras.models.Sequential([
    tf.keras.layers.InputLayer(input_shape=( 32, 32, 3 )),
    tf.keras.layers.Normalization(mean=3., variance=2.),
    tf.keras.layers.Normalization(mean=4., variance=6.),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Reshape((512, 225)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(96, return_sequences=True, return_state=False)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(96)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(192, activation='relu'),
    tf.keras.layers.Dense(10),
])

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: FileWriter
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
if exists(checkpoint_path) :
    model.load_weights(checkpoint_path)
    print("model load: " + checkpoint_path)
    input("Press Any Key!")

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Callback
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
class custom_callback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
        if( logs['accuracy'] >= 0.97 ):
            self.model.stop_training = True
    
custom_callback = custom_callback()

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Optimizer
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
optimizer = tf.keras.optimizers.Nadam(
    learning_rate=0.000001, beta_1=0.9, beta_2=0.999, epsilon=1e-07,
    name='Nadam'
)

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Loss Fn
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""                               
lossfn = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=False,
    reduction=tf.keras.losses.Reduction.AUTO,
    name='sparse_categorical_crossentropy'
)

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Summary
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model.compile(optimizer=optimizer, loss=lossfn, metrics=['accuracy'] )
model.save_weights(checkpoint_path)

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Training
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
history = model.fit( dataset, batch_size=120, epochs=10000, callbacks=[custom_callback] )

plt.figure(figsize=(6, 6))
plt.title("Actors recognitions")
for i in range(30):
    img = tf.keras.preprocessing.image.array_to_img(
        list_Image[i],
        data_format=None,
        scale=True
    )
    img_array = tf.keras.preprocessing.image.img_to_array(img)
    img_array = tf.expand_dims(img_array, 0)
    predictions = model.predict(img_array)
    score = tf.nn.softmax(predictions[0])
    plt.subplot(6, 6, i + 1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(list_file_actual[i])
    plt.xlabel(str(round(score[tf.math.argmax(score).numpy()].numpy(), 2)) + ":" +  str(list_label_actual[tf.math.argmax(score)]))
    
plt.show()

input('...')

Output: They are actually Thailand actors, broadcasts collected from the Internet.

Sample

Why does my model perform good on test dataset but very bad on real data?

Here is my code snippet for model architecture:

Output While training model:

Loss vs Val Loss Plot:

Evaluation:

1 Answers1