0

This code is preparing image data for a deep learning model to be trained on.

The classes variable is a list of object categories, and the label_data variable is a dictionary mapping image file names to their corresponding object categories.

The read_image_and_label function reads in an image file, resizes it, and converts the object categories to a one-hot encoded vector.

The create_dataset function uses tf.data.Dataset to create a dataset from a list of image file paths, applies the read_image_and_label function to each file path, shuffles the data, and batches it.

Finally, the train_file_paths and val_file_paths variables contain lists of file paths to use for training and validation datasets respectively, and the train_data and val_data variables are the actual datasets created by calling create_dataset on those lists.

The error that I am getting is a TypeError and it says that in the function read_image_and_label(), on line 18, you are trying to convert a Tensor object to a Path object, but the Path() function expects a string, bytes, or os.PathLike object as its argument.

It seems like the file_path argument passed to the function read_image_and_label() is a Tensor object, and not a string or bytes object. This is because you are using the tf.data.Dataset.from_tensor_slices() method to create the dataset, which creates a dataset of slices from a tensor.

classes = ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle',
           'bus', 'car', 'cat', 'chair', 'cow',
           'diningtable', 'dog', 'horse', 'motorbike', 'person',
           'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']

# Define the batch size and image size
batch_size = 32
img_size = (224, 224)


def read_image_and_label(file_path):
    # Convert file_path to a Path object and get its name

    file_name = Path(file_path).name
    
    # Get the labels for the file from label_data
    labels = label_data[file_name]
    
    # Read the image file and resize it
    img = tf.io.read_file(file_path) 
    img = tf.io.decode_jpeg(img, channels=3) 
    img = tf.image.resize(img, size=img_size, method='bicubic') 
    img = img / 255.0

    # Convert the labels to a one-hot vector
    label = tf.one_hot([classes.index(l) for l in labels], len(classes))
    return img, label

# Define a function to create a dataset from a list of file paths
def create_dataset(file_paths):
    # Create a dataset of file paths and apply the read_image_and_label function to each file path
    dataset = tf.data.Dataset.from_tensor_slices(file_paths)
    dataset = dataset.map(read_image_and_label)
    # Shuffle and batch the dataset
    dataset = dataset.shuffle(1000).batch(batch_size)
    return dataset

# Define the train and validation file paths
train_file_paths = [os.path.join(data_dir, 'train', f) for f in os.listdir(os.path.join(data_dir, 'train')) if f.endswith('.jpg')]
val_file_paths = [os.path.join(data_dir, 'valid', f) for f in os.listdir(os.path.join(data_dir, 'valid')) if f.endswith('.jpg')]

# print(train_file_paths)

# Create the train and validation datasets
train_data = create_dataset(train_file_paths)
val_data = create_dataset(val_file_paths)

To resolve this I have tried using

def read_image_and_label(file_path):
    # Convert file_path to a string
    file_path = file_path.numpy().decode('utf-8')
    
    # Convert file_path to a Path object and get its name
    file_name = Path(file_path).name
    
    # Get the labels for the file from label_data
    labels = label_data[file_name]
    
    # Read the image file and resize it
    img = tf.io.read_file(file_path) 
    img = tf.io.decode_jpeg(img, channels=3) 
    img = tf.image.resize(img, size=img_size, method='bicubic') 
    img = img / 255.0

    # Convert the labels to a one-hot vector
    label = tf.one_hot([classes.index(l) for l in labels], len(classes))
    return img, label

but this gives the error AttributeError: 'Tensor' object has no attribute 'numpy'

I have also tried this

def create_dataset(file_paths):
    # Create a dataset of file paths and apply the read_image_and_label function to each file path
    dataset = tf.data.Dataset.from_tensor_slices(file_paths)
    dataset = tf.make_ndarray(dataset)
    dataset = dataset.map(read_image_and_label)
    # Shuffle and batch the dataset
    dataset = dataset.shuffle(1000).batch(batch_size)
    return dataset

but this gives the error AttributeError: 'TensorSliceDataset' object has no attribute 'tensor_shape'

What I want to achieve is tensor dataset for training and validation with image and label

1 Answers1

0

You can make use of image_dataset_from_directory which is a usefulfunction in TensorFlow that allows you to easily load image data from a directory structure and convert it into a TensorFlow dataset object.

The usage of image_dataset_from_directory is as follows:

import tensorflow as tf
train_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    'path/to/train/directory',
    batch_size=32,
    image_size=(224, 224),
    shuffle=True,
    seed=42
)

This code will create a TensorFlow dataset object called train_dataset that contains all the images in the path/to/train/directory directory. We’ve also specified a batch size of 32, an image size of 224x224, and set shuffle to True with a random seed of 42.

Further for creating labels , the same function will automatically create labels for each image based on the name of the directory it’s in (class_1, class_2, etc.).,

train_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    'path/to/train/directory',
    batch_size=32,
    image_size=(224, 224),
    shuffle=True,
    seed=42
)

for images, labels in train_dataset.map(lambda x, y: (x, y)):
    print(labels)

If you want custom class names then in the first code snippet, you can add a mapping dictionary as an argument.

TF_Chinmay
  • 86
  • 4