How to convert a folder of images into X and Y batches with Keras?

Question

Say I have a folder of images such as:

PetData
|
Dog - images
|
Cat - images

How would I transform it into (x_train, y_train),(x_test, y_test) format? I see this format used extensively with the MNIST dataset which goes like:

mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()

However i'd like to do this with my own folder of images.

score 2 · Accepted Answer · answered Jul 09 '20 at 00:49

mnist.load_data() returns two tuples with the content of the images and the labels in uint8 arrays. You should get those arrays by loading the images of your folders (you can use modules such as PIL.Image in order to load X, your y is just the set labels provided by the folder name).

PIL.Image use example:

from PIL import Image
import glob

for infile in glob.glob("*.jpg"):
    im = Image.open(infile)

To split train/test you can use sklearn.model_selection.train_test_split:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

Roohollah Etemadi · Answer 2 · 2020-07-09T05:00:44.273

Suppose your train or test images are in folder PetData each class in separate folder as Dog and Cat. You can use ImageDataGenerator to prepare your train/test data as below:

from keras import layers
from keras import models

model = models.Sequential()
#define your model
#..........
#......


#Using ImageDataGenerator to read images from directories
from keras.preprocessing.image import ImageDataGenerator
train_dir = "PetData/"
#PetData/Dog/  : dog images
#PetData/Cat/  : cat images
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory( train_dir, target_size=(150, 150), batch_size=20)

history = model.fit_generator( train_generator, steps_per_epoch=100, epochs=30) #fit the model using train_generator

Hope this helps!

score 1 · Answer 3 · answered Jul 09 '20 at 05:29

If you want to import images from a folder in your computer you can import images 1 by 1 from the folder in insert the in a list.

Your folder format is as you have shown:

PetData
|
Dog - images
|
Cat - images

Assume path is a variable storing the address of PetData folder. We will use OpenCV to import images but you can use other libraries as well.

data = []
label = []
Files = ['Dog', 'Cat']
label_val = 0

for files in Files:
    cpath = os.path.join(path, files)
    cpath = os.path.join(cpath, 'images')
    for img in os.listdir(cpath):
        image_array = cv2.imread(os.path.join(cpath, img), cv2.IMREAD_COLOR)
        data.append(image_array)
        label.append(label_val)
    label_val = 1

Convert the list to a numpy array.

data = np.asarray(data)
label = np.asarray(label)

After importing the images you can use train_test_split to split the data for training and testing.

X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=0.33, random_state=42)

How to convert a folder of images into X and Y batches with Keras?

3 Answers3

Linked