I tried to import dataset from google COLAB, now I got "listdir: path should be string, bytes, os.PathLike, integer or None, not BatchDataset"

Question

I tried to import dataset from Google COLAB, already linked to google drive too.

This is now the code I use.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense, Conv2D, MaxPooling2D
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt
from keras.preprocessing.image import ImageDataGenerator
from PIL import Image
import tensorflow as tf

# dimensions of our images.
img_width, img_height = 150, 150

# Model configuration
batch_size = 50
img_width, img_height, img_num_channels = 32, 32, 3
loss_function = sparse_categorical_crossentropy
no_classes = 100
no_epochs = 100
optimizer = Adam()

train_ds = tf.keras.utils.image_dataset_from_directory(
  '/content/drive/MyDrive/Colab Notebooks/Training_Data',
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

val_ds = tf.keras.utils.image_dataset_from_directory(
  '/content/drive/MyDrive/Colab Notebooks/Training_Data',
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

# Determine shape of the data
input_shape = (img_width, img_height, img_num_channels)

model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))   
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='categorical_crossentropy',
              optimizer='Adam',
              metrics=['accuracy'])

# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    train_ds,
    target_size=(img_width, img_height),
    batch_size = batch_size,
    class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
    val_ds,
    target_size=(img_width, img_height),
    batch_size = batch_size,
    class_mode='categorical')

model.fit(
    train_generator,
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=epochs,
    val_ds=validation_generator,
    validation_steps=nb_validation_samples // batch_size)

Now I got this error.

TypeError                                 Traceback (most recent call last)
<ipython-input-35-1a98ad8aaf01> in <module>()
     82     target_size=(img_width, img_height),
     83     batch_size = batch_size,
---> 84     class_mode='categorical')
     85 
     86 validation_generator = test_datagen.flow_from_directory(

2 frames
/usr/local/lib/python3.7/dist-packages/keras_preprocessing/image/directory_iterator.py in __init__(self, directory, image_data_generator, target_size, color_mode, classes, class_mode, batch_size, shuffle, seed, data_format, save_to_dir, save_prefix, save_format, follow_links, subset, interpolation, dtype)
    113         if not classes:
    114             classes = []
--> 115             for subdir in sorted(os.listdir(directory)):
    116                 if os.path.isdir(os.path.join(directory, subdir)):
    117                     classes.append(subdir)

TypeError: listdir: path should be string, bytes, os.PathLike, integer or None, not BatchDataset

I don't know what to do next, I admit that programming is not my thing, but I need it since it involved on my thesis, and I don't know what to do now. Can anyone help solve this? I feel like I'm close to make it work.

This is not how you load a dataset (this does not even work conceptually), you should use something like ImageDataGenerator or similar to load your dataset, and you did not describe the dataset anyway. — Dr. Snoopy, Jun 23 '22 at 06:51
I use folder file, I also have tar.gz file too, I want my code to load this data and can run the test, but I don't know how or where it wrong. — อิม อัฐวงศ์, Jun 23 '22 at 07:22
This line: (input_train, target_train), (input_test, target_test) = directory Does not load a dataset, this is not how loading datasets work, I already suggested what you can use. ImageDataGenerator can load image class data from folders. — Dr. Snoopy, Jun 23 '22 at 07:31
If you have a tar.gz file, you might additionally need to extract it first before using tensorflow on it — Vishal Balaji, Jun 23 '22 at 08:13
I don't know what I should do? Did my new code still not work? I tried many of it and as I write before, I'm not good on programming and other can do it while I still stuck here. — อิม อัฐวงศ์, Jun 23 '22 at 09:00
The first argument to `flow_from_directory` needs to be a directory, and not a `image_dataset_from_directory` object. Check here - https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator#flow_from_directory. These are 2 entirely different functions. — Vishal Balaji, Jun 23 '22 at 09:17
If you use flow_from_directory, there is no need to use ImageDataGenerator. — Dr. Snoopy, Jun 23 '22 at 09:24
Now I got this error ` in () 18 optimizer = Adam() 19 ---> 20 train_generator = train_datagen.flow_from_directory( 21 directory=r"/content/drive/MyDrive/Colab Notebooks/Training_Data", 22 target_size=(224, 224), NameError: name 'train_datagen' is not defined` What do I miss so this error happen? — อิม อัฐวงศ์, Jun 24 '22 at 03:25

score 0 · Answer 1 · answered Jan 13 '23 at 22:11

As correctly mentioned by @Dr. Snoopy and @ Vishal Balaji , you should give directory rather than directly putting the directory path at the image_dataset_from_directory API. Something like below:

train_dataset="/content/drive/MyDrive/MY WORK/cats_and_dogs_filtered/train"

img_width, img_height = 150, 150
batch_size = 32
#We cannot put the direct path of the directory here
train_ds = tf.keras.utils.image_dataset_from_directory(
  train_dataset,
  validation_split=0.2, 
  subset="training",
  seed=123,
  image_size=(img_height, img_width))

You are using the two different APIs (image_dataset_from_directory and flow_from_directory) to import the dataset from the directory and trying to train the model with these. You can use either one of these.

Check the below code to train the model by importing and augmenting the dataset using flow_from_directory API:

# this is the augmentation configuration we will use for training
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,)

train_generator = train_datagen.flow_from_directory(
    train_dataset,
    target_size=(150,150),
    batch_size = 32,
    class_mode='sparse')  # Used 'Sparse' class_mode for Image dataset

# We should apply the same data preprocessing to the validation data while model training

validation_generator = train_datagen.flow_from_directory(
    validation_dataset,
    target_size=(150,150),
    batch_size = 32,
    class_mode='sparse')

Please have a look at this gist as a reference for more detail in this.

I tried to import dataset from google COLAB, now I got "listdir: path should be string, bytes, os.PathLike, integer or None, not BatchDataset"

1 Answers1