0

I have a project in python, the goal is to build a model to predict an image of a cat or a dog. My training set has a size of 24977 images, i want to use 10% of that using validation_split in keras. However, when i run this code:

model.fit(x_2, y, epochs = 5, validation_split = 0.1)

this process only took 703 out of 24977, which i don't want to have(it should be 2500 images approximately).

You can see there are 703 images being processed here

The shapes of x_2 and y which i use to feed my model are shown here: for y for x_2

Can anyone explain this error to me and tell me a solution to fix it please? Thank you so much

Here is my codes:

!pip install tensorflow
!pip install keras
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import pickle
import numpy as np
import time
import tensorflow as tf
from tensorflow.keras.callbacks import TensorBoard
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

x = pickle.load(open('x.pkl','rb'))
y = pickle.load(open('y.pkl','rb'))

x_1 = x/255

x_2 = x_1.astype('float')

model = Sequential()

model.add(Conv2D(64, (3,3), activation = 'relu'))


model.add(MaxPooling2D((2,2)))



model.add(Conv2D(64, (3,3), activation = 'relu'))


model.add(MaxPooling2D((2,2)))


model.add(Flatten())

model.add(Dense(128, input_shape = x.shape[1:], activation = 'relu'))

model.add(Dense(2, activation = 'softmax'))

model.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])

model.fit(x_2, y, epochs = 5, validation_split = 0.1)

I tried to use 10% of my training set data but only got 2% instead and that is not what i want.

linh dao
  • 17
  • 3
  • Can you post your error log message? – partizanos Nov 15 '22 at 21:33
  • Thank you for ypur comment. Actually i don't have any error message, it just doesn't make sense when i typed validation_split = 0.1, it only process 703 images(please see my edited question for more detail, i added a screenshot) – linh dao Nov 15 '22 at 21:37
  • please post the x_2.shape and y.shape that your feed to your model. – partizanos Nov 15 '22 at 21:42
  • Please check my post again, i just added two screenshot of their shape. – linh dao Nov 15 '22 at 22:05
  • This link provide explanation to your question: [How big should batch size and number of epochs be when fitting a model](https://stackoverflow.com/questions/35050753/how-big-should-batch-size-and-number-of-epochs-be-when-fitting-a-model#:~:text=Generally%20batch%20size%20of%2032,b%2Fw%2050%20to%20100.) – KEHINDE ELELU Nov 15 '22 at 22:39
  • This link provide more detail [Model epoch and batch size](https://stackoverflow.com/questions/35050753/how-big-should-batch-size-and-number-of-epochs-be-when-fitting-a-model#:~:text=Generally%20batch%20size%20of%2032,b%2Fw%2050%20to%20100.) – KEHINDE ELELU Nov 15 '22 at 22:41

1 Answers1

0

You actually do already what you want. Your batch_size is None which defaults to 32. 22480/32 = ceil(702.5) = 703

partizanos
  • 1,064
  • 12
  • 22
  • Thank you. So what is the correct batch_size to get 2500 images processed here (instead of 703)? – linh dao Nov 15 '22 at 22:27
  • 703 is the number of batches this is NOT the number of images. the number of images 703 * 32 = 22496 (the last batch contains a bit less) which is 90% of your data. The 0.1 validation split means that 10% (2500 )of your data are not used for training only for testing. So for training you are using 22496 images out of the total ~25k. To use only 2500 images instead you should pass less images ie x_2[:2500]. If you find this answer helpful mark it as correct – partizanos Nov 15 '22 at 22:44
  • Thank you so much, i'll try your solution. But for now i have another problem. It's likely to get this message when i run "model.fit" : "The kernel appears to have died. It will restart automatically." Do you know how to fix this? – linh dao Nov 15 '22 at 23:17
  • You have to restart your jupyter kernel and possible you have too much data which crashes your kernel try with a python script. – partizanos Nov 16 '22 at 09:41