4

I am using a primitive neural network on the mnist dataset, however my mode gets stuck at 42% of accuracy for the validation data.

The data is csv, with the format being: 60000 rows (for training data) and 785 columns, the first one being the label.

The following is the code to segment and convert the CSV data, representing the images (28x28):

import pandas as pd 
import numpy as np
import tensorflow as tf

df = pd.read_csv('mnist_train.csv')
dff = pd.read_csv('mnist_test.csv')

#train set
label = np.array(df.iloc[:,0])
data = np.array(df.iloc[:,1:])
sep = []
for i in range(60000):
    temp = []
    for j in range(28):
        temp.append(data[i,j*28:(j+1)*28])
    sep.append(temp)
    
sep = np.array(sep)
for i in range(60000):
    for j in range(28):
        for k in range(28):
            sep[i,j,k] = sep[i,j,k]/255
labels_array = []
for i in label:
    if i==0:
        labels_array.append([1,0,0,0,0,0,0,0,0,0])
    if i==1:
        labels_array.append([0,1,0,0,0,0,0,0,0,0])
    if i==2:
        labels_array.append([0,0,1,0,0,0,0,0,0,0])
    if i==3:
        labels_array.append([0,0,0,1,0,0,0,0,0,0])
    if i==4:
        labels_array.append([0,0,0,0,1,0,0,0,0,0])
    if i==5:
        labels_array.append([0,0,0,0,0,1,0,0,0,0])
    if i==6:
        labels_array.append([0,0,0,0,0,0,1,0,0,0])
    if i==7:
        labels_array.append([0,0,0,0,0,0,0,1,0,0])
    if i==8:
        labels_array.append([0,0,0,0,0,0,0,0,1,0])
    if i==9:
        labels_array.append([0,0,0,0,0,0,0,0,0,1])

labels_array = np.array(labels_array)

#train
label_t = np.array(dff.iloc[:,0])
data_t = np.array(dff.iloc[:,1:])
sep_t = []
for i in range(10000):
    temp = []
    for j in range(28):
        temp.append(data_t[i,j*28:(j+1)*28])
    sep_t.append(temp)
    
sep_t = np.array(sep_t)

for i in range(10000):
    for j in range(28):
        for k in range(28):
            sep_t[i,j,k] = sep_t[i,j,k]/255

labels_array_t = []
for i in label_t:
    if i==0:
        labels_array_t.append([1,0,0,0,0,0,0,0,0,0])
    if i==1:
        labels_array_t.append([0,1,0,0,0,0,0,0,0,0])
    if i==2:
        labels_array_t.append([0,0,1,0,0,0,0,0,0,0])
    if i==3:
        labels_array_t.append([0,0,0,1,0,0,0,0,0,0])
    if i==4:
        labels_array_t.append([0,0,0,0,1,0,0,0,0,0])
    if i==5:
        labels_array_t.append([0,0,0,0,0,1,0,0,0,0])
    if i==6:
        labels_array_t.append([0,0,0,0,0,0,1,0,0,0])
    if i==7:
        labels_array_t.append([0,0,0,0,0,0,0,1,0,0])
    if i==8:
        labels_array_t.append([0,0,0,0,0,0,0,0,1,0])
    if i==9:
        labels_array_t.append([0,0,0,0,0,0,0,0,0,1])

labels_array_t = np.array(labels_array_t)

and the following is the learning network:

Dense = tf.keras.layers.Dense
fc_model = tf.keras.Sequential(
    [
      tf.keras.Input(shape=(28,28)),
      tf.keras.layers.Flatten(),
      Dense(128, activation='relu'),
      Dense(32, activation='relu'),
      Dense(10, activation='softmax')])
fc_model.compile(optimizer="Adam", loss="categorical_crossentropy", metrics=["accuracy"])
history = fc_model.fit(sep, labels_array, batch_size=128, validation_data=(sep_t, labels_array_t) ,epochs=35)

the following is the result I get:

Train on 60000 samples, validate on 10000 samples
Epoch 1/35
60000/60000 [==============================] - 2s 31us/sample - loss: 1.8819 - accuracy: 0.3539 - val_loss: 1.6867 - val_accuracy: 0.4068
Epoch 2/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.6392 - accuracy: 0.4126 - val_loss: 1.6407 - val_accuracy: 0.4098
Epoch 3/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.5969 - accuracy: 0.4224 - val_loss: 1.6202 - val_accuracy: 0.4196
Epoch 4/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.5735 - accuracy: 0.4291 - val_loss: 1.6158 - val_accuracy: 0.4220
Epoch 5/35
60000/60000 [==============================] - 1s 25us/sample - loss: 1.5561 - accuracy: 0.4324 - val_loss: 1.6089 - val_accuracy: 0.4229
Epoch 6/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.5423 - accuracy: 0.4377 - val_loss: 1.6074 - val_accuracy: 0.4181
Epoch 7/35
60000/60000 [==============================] - 2s 25us/sample - loss: 1.5309 - accuracy: 0.4416 - val_loss: 1.6053 - val_accuracy: 0.4226
Epoch 8/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.5207 - accuracy: 0.4435 - val_loss: 1.6019 - val_accuracy: 0.4252
Epoch 9/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.5111 - accuracy: 0.4480 - val_loss: 1.6015 - val_accuracy: 0.4233
Epoch 10/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.5020 - accuracy: 0.4517 - val_loss: 1.6038 - val_accuracy: 0.4186
Epoch 11/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.4954 - accuracy: 0.4530 - val_loss: 1.6096 - val_accuracy: 0.4209
Epoch 12/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.4885 - accuracy: 0.4554 - val_loss: 1.6003 - val_accuracy: 0.4278
Epoch 13/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4813 - accuracy: 0.4573 - val_loss: 1.6072 - val_accuracy: 0.4221
Epoch 14/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.4749 - accuracy: 0.4598 - val_loss: 1.6105 - val_accuracy: 0.4242
Epoch 15/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4693 - accuracy: 0.4616 - val_loss: 1.6160 - val_accuracy: 0.4213
Epoch 16/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4632 - accuracy: 0.4626 - val_loss: 1.6149 - val_accuracy: 0.4266
Epoch 17/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4580 - accuracy: 0.4642 - val_loss: 1.6145 - val_accuracy: 0.4267
Epoch 18/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4532 - accuracy: 0.4656 - val_loss: 1.6169 - val_accuracy: 0.4330
Epoch 19/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.4479 - accuracy: 0.4683 - val_loss: 1.6198 - val_accuracy: 0.4236
Epoch 20/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.4436 - accuracy: 0.4693 - val_loss: 1.6246 - val_accuracy: 0.4264
Epoch 21/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4389 - accuracy: 0.4713 - val_loss: 1.6300 - val_accuracy: 0.4254
Epoch 22/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4350 - accuracy: 0.4730 - val_loss: 1.6296 - val_accuracy: 0.4258
Epoch 23/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4328 - accuracy: 0.4727 - val_loss: 1.6279 - val_accuracy: 0.4257
Epoch 24/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4282 - accuracy: 0.4742 - val_loss: 1.6327 - val_accuracy: 0.4209
Epoch 25/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4242 - accuracy: 0.4745 - val_loss: 1.6387 - val_accuracy: 0.4256
Epoch 26/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4210 - accuracy: 0.4765 - val_loss: 1.6418 - val_accuracy: 0.4240
Epoch 27/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4189 - accuracy: 0.4773 - val_loss: 1.6438 - val_accuracy: 0.4237
Epoch 28/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4151 - accuracy: 0.4781 - val_loss: 1.6526 - val_accuracy: 0.4184
Epoch 29/35
60000/60000 [==============================] - 1s 25us/sample - loss: 1.4129 - accuracy: 0.4788 - val_loss: 1.6572 - val_accuracy: 0.4190
Epoch 30/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.4097 - accuracy: 0.4801 - val_loss: 1.6535 - val_accuracy: 0.4225
Epoch 31/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.4070 - accuracy: 0.4795 - val_loss: 1.6689 - val_accuracy: 0.4188
Epoch 32/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4053 - accuracy: 0.4809 - val_loss: 1.6663 - val_accuracy: 0.4194
Epoch 33/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4029 - accuracy: 0.4831 - val_loss: 1.6618 - val_accuracy: 0.4220
Epoch 34/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4000 - accuracy: 0.4832 - val_loss: 1.6603 - val_accuracy: 0.4270
Epoch 35/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.3979 - accuracy: 0.4845 - val_loss: 1.6741 - val_accuracy: 0.4195

would this only be because of the optimizer? I tried SGD but to avail!

NeuroEng
  • 191
  • 8

1 Answers1

1

TLDR; Change loss to categorical_crossentropy


The optimizer is not an issue here.

The immediate issue that I can see is the fact that for a multi-class classification problem you are using the loss as mse. Please change it to categorical_crossentropy. That should get you better numbers. Also, don't forget to remove mse from metrics as well.

fc_model.compile(optimizer="Adam", loss="categorical_crossentropy", metrics=["accuracy"])

For future reference, you can use the following table for best practices. It would be even better if you spend time researching why each of these activations and loss functions are used for specific problems mathematically.

enter image description here


Note: Another side note, even though this does not impact any performance, you don't need to convert the labels to the one-hot vectors.

# YOU CAN SKIP THIS COMPLETELY
for i in label_t:
    if i==0:
        labels_array_t.append([1,0,0,0,0,0,0,0,0,0])
    if i==1:
        labels_array_t.append([0,1,0,0,0,0,0,0,0,0])
    if i==2:
        labels_array_t.append([0,0,1,0,0,0,0,0,0,0])
    if i==3:
        labels_array_t.append([0,0,0,1,0,0,0,0,0,0])
    if i==4:
        labels_array_t.append([0,0,0,0,1,0,0,0,0,0])
    .....

Instead, you can use the original label or label_t DIRECTLY as your y_train and instead of using the loss categorical_crossentropy you can change it to sparse_categorical_crossentropy


EDIT:

Based on your comments, and the testing I did on another mnist dataset, please try the following -

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128,activation='relu'),
  tf.keras.layers.Dense(10)
])
model.compile(
    optimizer='adam',
    loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
    metrics=['accuracy'],
)

model.fit(
    ds_train,
    epochs=6,
    validation_data=ds_test,
)

Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
  • Thanks for tour answer, I changed the loss function to categorical_crossentropy (as shown in the question), but I still get the same result, the accuracy gets stuck at about 40% – NeuroEng Feb 13 '21 at 03:05
  • I see the output that you are showing still has `mae` (epoch results).. when you have removed `mae` from metrics. can you post the latest results? – Akshay Sehgal Feb 13 '21 at 03:05
  • Are you sure you are running the newly compiled model? Because the newly compiled model only has `accuracy` while the epoch results show `mae` and `accuracy`. – Akshay Sehgal Feb 13 '21 at 03:07
  • Can you rerun with `relu` instead of `swish` activation? – Akshay Sehgal Feb 13 '21 at 03:10
  • Sorry, I just updated that. the new results are the ones compiled with the new loss function – NeuroEng Feb 13 '21 at 03:10
  • and I changed the activation of the hidden layer to relu as well, still the same thing – NeuroEng Feb 13 '21 at 03:12
  • Also, I see the loss falling properly, you might just be underfitting. increase the complexity of the model by adding a Dense layer 64 neurons before the 32 one. Keep `relu` activation. – Akshay Sehgal Feb 13 '21 at 03:12
  • I updated the code, and the results, and unfortunately, still the same result – NeuroEng Feb 13 '21 at 03:17
  • Can you post the latest results? Ill be able to diagnose then. – Akshay Sehgal Feb 13 '21 at 03:18
  • This might be a weird fix, but i have encountered a bug like this before. can you change `optimizer="adam"` instead of `"Adam"` – Akshay Sehgal Feb 13 '21 at 03:23
  • I have attached them, does the fact that it's only the validation data getting stuck has to do anything with it? – NeuroEng Feb 13 '21 at 03:23
  • It could be an issue with validation data, but MNIST data (60000) is a very standard dataset. I think we should explore a code bug possibility before checking if data issues are there. – Akshay Sehgal Feb 13 '21 at 03:24
  • Usually MNIST problems on the same dataset are solved very very easily with even a single hidden dense layer. – Akshay Sehgal Feb 13 '21 at 03:25
  • changed "Adam" to "adam", same thing – NeuroEng Feb 13 '21 at 03:25
  • MNIST data is available in `tensorflow.datasets` by default. Is it necessary for you to use this csv? because the same dataset is available else where. If your code performs well on that, then the issue is with your dataset. https://www.tensorflow.org/datasets/catalog/mnist – Akshay Sehgal Feb 13 '21 at 03:27
  • Hi, can you change your loss to this - `loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True)` – Akshay Sehgal Feb 13 '21 at 03:36
  • The problem was probably with the dataset, cuz this solved the issue! thank you. one last question, if the data is given in the reverse order to the classifier (e.g. from 764 to 0 rather than 0 to 764) that would not cause a problem, would it? – NeuroEng Feb 13 '21 at 03:39
  • not really. it would learn the non linear relationship and train weights accordingly... Are you talking about reverse order while training or evaluation? – Akshay Sehgal Feb 13 '21 at 03:40
  • 1
    If you train on reverse order, then its all good. BUT, if you just pass it like that for prediction. Nope. That will cause issues. But you can do image augmentation to solve this. While training, make 2 versions of each image, (forward and reverse) and copy the same label on both. – Akshay Sehgal Feb 13 '21 at 03:44
  • 1
    That way you will have to train on more images but the model will learn how to work with normal and augmented images – Akshay Sehgal Feb 13 '21 at 03:44
  • 1
    well for both, cuz for the CSV data that was how the data was saved, so if you plot them they are upside down, i thought that might have caused the issue, plus i changed the loss to sparse_categorical_crossentropy for the new model with out changing the label, like you said – NeuroEng Feb 13 '21 at 03:44
  • with (28,28) you can do rotations and horizontal / vertical flips. – Akshay Sehgal Feb 13 '21 at 03:45
  • Ahhhh, ok,ok, so you need image augmentation for training in that case. – Akshay Sehgal Feb 13 '21 at 03:45
  • For each image, generate rotations and flips.. copy the label and train. – Akshay Sehgal Feb 13 '21 at 03:46
  • 1
    That way the model will learn to predict the right labels even for rotated/flipped images. – Akshay Sehgal Feb 13 '21 at 03:46
  • `sparse_categorical_crossentropy` reduces the code quite a bit. Glad you were able to implement it instead of the onehots. – Akshay Sehgal Feb 13 '21 at 03:47
  • thank you for your complete and informative answers. – NeuroEng Feb 13 '21 at 03:49
  • Glad to help anytime. you can connect with me from my profile incase of any future questions. cheers. – Akshay Sehgal Feb 13 '21 at 03:50