5

Say I have a folder of images such as:

PetData
|
Dog - images
|
Cat - images

How would I transform it into (x_train, y_train),(x_test, y_test) format? I see this format used extensively with the MNIST dataset which goes like:

mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()

However i'd like to do this with my own folder of images.

Juan Diego Lozano
  • 989
  • 2
  • 18
  • 30
subprimeloads
  • 372
  • 3
  • 13

3 Answers3

2

mnist.load_data() returns two tuples with the content of the images and the labels in uint8 arrays. You should get those arrays by loading the images of your folders (you can use modules such as PIL.Image in order to load X, your y is just the set labels provided by the folder name).

PIL.Image use example:

from PIL import Image
import glob

for infile in glob.glob("*.jpg"):
    im = Image.open(infile)

To split train/test you can use sklearn.model_selection.train_test_split:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
Juan Diego Lozano
  • 989
  • 2
  • 18
  • 30
1

Suppose your train or test images are in folder PetData each class in separate folder as Dog and Cat. You can use ImageDataGenerator to prepare your train/test data as below:

from keras import layers
from keras import models

model = models.Sequential()
#define your model
#..........
#......


#Using ImageDataGenerator to read images from directories
from keras.preprocessing.image import ImageDataGenerator
train_dir = "PetData/"
#PetData/Dog/  : dog images
#PetData/Cat/  : cat images
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory( train_dir, target_size=(150, 150), batch_size=20)

history = model.fit_generator( train_generator, steps_per_epoch=100, epochs=30) #fit the model using train_generator

Hope this helps!

Roohollah Etemadi
  • 1,243
  • 1
  • 6
  • 18
1

If you want to import images from a folder in your computer you can import images 1 by 1 from the folder in insert the in a list.

Your folder format is as you have shown:

PetData
|
Dog - images
|
Cat - images

Assume path is a variable storing the address of PetData folder. We will use OpenCV to import images but you can use other libraries as well.

data = []
label = []
Files = ['Dog', 'Cat']
label_val = 0

for files in Files:
    cpath = os.path.join(path, files)
    cpath = os.path.join(cpath, 'images')
    for img in os.listdir(cpath):
        image_array = cv2.imread(os.path.join(cpath, img), cv2.IMREAD_COLOR)
        data.append(image_array)
        label.append(label_val)
    label_val = 1

Convert the list to a numpy array.

data = np.asarray(data)
label = np.asarray(label)

After importing the images you can use train_test_split to split the data for training and testing.

X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=0.33, random_state=42)