0

I am building a CNN model to classify images into 11 classes as follows: 'dew','fogsmog','frost','glaze','hail','lightning','rain','rainbow','rime','sandstorm','snow' and while training I get good accuracy and good validation accuracy

Epoch 1/20
131/131 [==============================] - 1012s 8s/step - loss: 1.8284 - accuracy: 0.3724 - val_loss: 1.4365 - val_accuracy: 0.5719
Epoch 2/20
131/131 [==============================] - 67s 511ms/step - loss: 1.3041 - accuracy: 0.5516 - val_loss: 1.1048 - val_accuracy: 0.6515
Epoch 3/20
131/131 [==============================] - 67s 510ms/step - loss: 1.1547 - accuracy: 0.6161 - val_loss: 1.0509 - val_accuracy: 0.6732
Epoch 4/20
131/131 [==============================] - 67s 510ms/step - loss: 1.0681 - accuracy: 0.6394 - val_loss: 1.0644 - val_accuracy: 0.6616
Epoch 5/20
131/131 [==============================] - 66s 505ms/step - loss: 1.0269 - accuracy: 0.6509 - val_loss: 1.0929 - val_accuracy: 0.6363
Epoch 6/20
131/131 [==============================] - 66s 506ms/step - loss: 1.0018 - accuracy: 0.6576 - val_loss: 0.9666 - val_accuracy: 0.6869
Epoch 7/20
131/131 [==============================] - 67s 507ms/step - loss: 0.9384 - accuracy: 0.6790 - val_loss: 0.8623 - val_accuracy: 0.7144
Epoch 8/20
131/131 [==============================] - 66s 505ms/step - loss: 0.9160 - accuracy: 0.6903 - val_loss: 0.8834 - val_accuracy: 0.7180
Epoch 9/20
131/131 [==============================] - 66s 502ms/step - loss: 0.8909 - accuracy: 0.6915 - val_loss: 0.8667 - val_accuracy: 0.7050
Epoch 10/20
131/131 [==============================] - 66s 503ms/step - loss: 0.8476 - accuracy: 0.7075 - val_loss: 0.8100 - val_accuracy: 0.7339
Epoch 11/20
131/131 [==============================] - 67s 509ms/step - loss: 0.8108 - accuracy: 0.7262 - val_loss: 0.8352 - val_accuracy: 0.7137
Epoch 12/20
131/131 [==============================] - 66s 506ms/step - loss: 0.7922 - accuracy: 0.7212 - val_loss: 0.8368 - val_accuracy: 0.7195
Epoch 13/20
131/131 [==============================] - 66s 505ms/step - loss: 0.7424 - accuracy: 0.7442 - val_loss: 0.8813 - val_accuracy: 0.7166
Epoch 14/20
131/131 [==============================] - 66s 503ms/step - loss: 0.7060 - accuracy: 0.7579 - val_loss: 0.8453 - val_accuracy: 0.7231
Epoch 15/20
131/131 [==============================] - 66s 503ms/step - loss: 0.6767 - accuracy: 0.7584 - val_loss: 0.8347 - val_accuracy: 0.7151
Epoch 16/20
131/131 [==============================] - 66s 506ms/step - loss: 0.6692 - accuracy: 0.7632 - val_loss: 0.8038 - val_accuracy: 0.7346
Epoch 17/20
131/131 [==============================] - 67s 507ms/step - loss: 0.6308 - accuracy: 0.7718 - val_loss: 0.7956 - val_accuracy: 0.7455
Epoch 18/20
131/131 [==============================] - 67s 508ms/step - loss: 0.6043 - accuracy: 0.7901 - val_loss: 0.8295 - val_accuracy: 0.7477
Epoch 19/20
131/131 [==============================] - 66s 506ms/step - loss: 0.5632 - accuracy: 0.8018 - val_loss: 0.7918 - val_accuracy: 0.7455
Epoch 20/20
131/131 [==============================] - 67s 510ms/step - loss: 0.5368 - accuracy: 0.8138 - val_loss: 0.7798 - val_accuracy: 0.7549

but when I predict and submit my results I get very low accuracy. here is my model

 from keras.preprocessing.image import ImageDataGenerator
 IMG_SIZE = 50

 datagen = ImageDataGenerator(
    rescale=1./255,        
    validation_split=0.25)


 train_dataset = datagen.flow_from_directory(    directory=Train_folder,
                                             shuffle=True,
                                             target_size=(50,50), 
                                             subset="training",
                                             classes=['dew','fogsmog','frost','glaze','hail','lightning','rain','rainbow','rime','sandstorm','snow'],
                                             class_mode='categorical')

validation_dataset = datagen.flow_from_directory(  directory=Train_folder,
                                                    shuffle=True,
                                                    target_size=(50,50), 
                                                    subset="validation",
                                                    classes=['dew','fogsmog','frost','glaze','hail','lightning','rain','rainbow','rime','sandstorm','snow'],
                                                    class_mode='categorical')

Found 4168 images belonging to 11 classes.
Found 1383 images belonging to 11 classes.
model = Sequential([
layers.Conv2D(32, kernel_size=(3, 3),activation="relu",padding='same',input_shape=(IMG_SIZE, IMG_SIZE, 3)),
layers.MaxPooling2D((2, 2),padding='same'),
layers.Dropout(0.25),
layers.Conv2D(64, (3, 3), activation="relu",padding='same'),
layers.MaxPooling2D(pool_size=(2, 2),padding='same'),
layers.Dropout(0.25),
layers.Conv2D(128, (3, 3), activation="relu",padding='same'),
layers.MaxPooling2D(pool_size=(2, 2),padding='same'),
layers.Dropout(0.4),
layers.Flatten(),
layers.Dense(128, activation="relu"),
layers.Dropout(0.3),
layers.Dense(11, activation='softmax')
])

model.build()
model.summary()

model.compile(optimizer='adam',
          loss=tf.keras.losses.CategoricalCrossentropy(),
          metrics=['accuracy'])

history = model.fit(
train_dataset,
epochs=20,
validation_data=validation_dataset,
 )
model.save('model.tfl')



Test_folder="/content/drive/MyDrive/[NN'22] Project Dataset/Test"
test_data = []
labels = []
for img in tqdm(os.listdir(Test_folder)):
  path = os.path.join(Test_folder, img)
  img_data2 = cv2.imread(path)
  try:
    img_data2 = cv2.resize(img_data2, (IMG_SIZE,IMG_SIZE))
  except:
    continue
  test_data.append([np.array(img_data2)])
  labels.append(img)

X_data=np.array([test_data]).reshape(-1, IMG_SIZE, IMG_SIZE, 3)
prediction = model.predict([X_data])
  • So and what accuracy do you get on your test data? Is your training dataset balanced? – NotAName May 17 '22 at 00:27
  • can you explain what do you mean by balanced I think it is the same as the training but without classes – thomas Soliman May 17 '22 at 06:09
  • I should predict the class of each image in the test directory and submit it as csv file. – thomas Soliman May 17 '22 at 06:12
  • Balanced means that your classes have roughly equal number of samples. In imbalanced datasets (if one of the classes has way more samples than others) "accuracy" metric gives erroneous results. – NotAName May 17 '22 at 07:15
  • Okay I got it, the training samples are imbalanced. smallest class has 121 image and the largest one has 1059. How should I balance the data (how many samples should I take for each class. – thomas Soliman May 17 '22 at 07:27
  • I made all classes the same sample number. the accuracy increased when submitting the predicted values but still low the (0.675). Any other suggestions? – thomas Soliman May 17 '22 at 09:56
  • 0.675 on test data when validation data accuracy is 0.7549 is reasonable in my opinion. Is there a specific limit of test accuracy you're aiming for? – NotAName May 17 '22 at 14:20
  • well it is a competition so I am trying to increase the accuracy as mush as I can – thomas Soliman May 17 '22 at 15:22

0 Answers0