0

In Using Keras APIs, how can I import images in batches with exactly K instances of each ID in a given batch?, the answer from Dmytro Prylipko requires that I have a list of tf.data.Dataset objects to pass into tf.data.Dataset.zip.

I need each tf.data.Dataset object to only contain only instances of one ID, and for there to be an equal number of tf.data.Dataset objects as there are IDs.

My data consists of images imported from a directory using the following structure:

path/to/image_dir/
  split_name/  # Ex: 'train'
    label1/  # Ex: 'airplane' or '0015'
      xxx.png
      xxy.png
      xxz.png
    label2/
      xxx.png
      xxy.png
      xxz.png
  split_name/  # Ex: 'test'
    ...

How can I get this list of tf.data.Dataset objects?

1 Answers1

1

Using the answers in the following links, I have come up with an example to implement this requirement:

TensorFlow: training on my own image

Create tensorflow dataset from image local directory

https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_tensor_slices

Example implementation:

Given the following directory structure:

os.listdir('/tmp/cats-v-dogs/training')

output: ['cats', 'dogs']

base_path = '/tmp/cats-v-dogs/training'

# Function to create list (image_list_final) of lists where each list will have the filenames of each class and
# another list (label_list_final) of lists where each list will have the labels of the class
 
def read_data():
  image_list_final = []
  label_list_final = []
  label_map_dict = {}
  count_label = 0
  for class_name in os.listdir(base_path):
    image_list = []
    label_list = []
    class_path = os.path.join(base_path, class_name)
    label_map_dict[class_name]=count_label
    for image_name in os.listdir(class_path):
      image_path = os.path.join(class_path, image_name)
      label_list.append(count_label)
      image_list.append(image_path)
    count_label += 1
    image_list_final.append(image_list)
    label_list_final.append(label_list)
  return image_list_final, label_list_final, label_map_dict

image_list_final, label_list_final, label_map_dict = read_data()

# create the datasets where the dataset_list will have one dataset for images in each class
dataset_list = []
for i,j in zip(image_list_final,label_list_final):
    dataset_list.append(tf.data.Dataset.from_tensor_slices((tf.constant(i), tf.constant(j))))
    #dataset = dataset.shuffle(len(i))
    #dataset = dataset.repeat(epochs)
    #dataset = dataset.map(_parse_function).batch(batch_size)

# the below function to parse the filenames and labels in the dataset into arrays
def _parse_function(filename, label):
    image_string = tf.io.read_file(filename, "file_reader")
    image_decoded = tf.image.decode_jpeg(image_string, channels=3)
    image = tf.cast(image_decoded, tf.float32)
    return image, label

dataset = dataset_list[0].map(_parse_function)

#Check result
for i in dataset.take(1):
  print(i)

For a more detailed implementation example, please check the below link: https://nbviewer.jupyter.org/github/abhiatgith/ipynbees/blob/master/How%20Tos/Creating_TF_Datasets_by_Class_Labels_Dogs_vs_Cats.ipynb

Abhilash Rajan
  • 349
  • 1
  • 7
  • Thank you - I see that you've commented out dataset.repeat(epochs), is it important to include this when creating datasets or can it be omitted? – magmacollaris Apr 28 '21 at 12:18
  • It is not required to just create the dataset but these are methods(repeat, shuffle etc) that can be used on your dataset object as per your use case. You can take a look [here](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) for details on the methods that are available. – Abhilash Rajan Apr 28 '21 at 12:30