0

I have images in .png format and their labels in .csv format. I want to convert them in tfrecords format. I'm very new to tensorflow. If someone can point me towards all the things i need to know and how to do this. It'll be great.

I've scoured through the net. But some are outdated or some are very advanced.

Edit: My images are stored in a single directory.

Thanks

Vaibhav Jha
  • 27
  • 2
  • 9

1 Answers1

1

You have to convert your image into tf.train.Example in order to write it as tfrecord file. Here is a simple example of how you can do this.

Taking a look at csv file:

this

Code:

# The following functions can be used to convert a value to a type compatible
# with tf.train.Example.

def _bytes_feature(value):
    """Returns a bytes_list from a string / byte."""
    if isinstance(value, type(tf.constant(0))):
        value = value.numpy() # BytesList won't unpack a string from an EagerTensor.
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def _float_feature(value):
    """Returns a float_list from a float / double."""
    return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))

def _int64_feature(value):
    """Returns an int64_list from a bool / enum / int / uint."""
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))


def image_example(image_string, label):
    image_shape = tf.image.decode_png(image_string).shape
    feature = {
      'height': _int64_feature(image_shape[0]),
      'width': _int64_feature(image_shape[1]),
      'depth': _int64_feature(image_shape[2]),
      'label': _int64_feature(label),
      'image_raw': _bytes_feature(image_string),
    }
    return tf.train.Example(features=tf.train.Features(feature=feature))

The image_example functions return a tf.train.Example object of a single image.

You have to iterate over the data frame to create tf.train.Example object of every image and write the object using tf.io.TFRecordWriter.

Code:

record_file = 'images.tfrecords'
image_labels = {
    'cat': 0,
    'bridge': 1,
}
with tf.io.TFRecordWriter(record_file) as writer:
    for row in df.index:
        full_path = 'data/img/new/' + df['filename'][row]
        label = image_labels[df['label'][row]]
        image_string = tf.io.read_file(full_path)
        tf_example = image_example(image_string, label)
        writer.write(tf_example.SerializeToString())

For a complete tutorial on Reading/Writing TFRecord files see this.

If you have multiple labels you can create multiple features in your feature dictionary inside image_example. Code:

def image_example(image_string, label_color, label_type):
    image_shape = tf.image.decode_png(image_string).shape
    feature = {
      'height': _int64_feature(image_shape[0]),
      'width': _int64_feature(image_shape[1]),
      'depth': _int64_feature(image_shape[2]),
      'label_color': _int64_feature(label_color),
      'label_type': _int64_feature(label_type),
      'image_raw': _bytes_feature(image_string),
    }
    return tf.train.Example(features=tf.train.Features(feature=feature))
Aniket Bote
  • 3,456
  • 3
  • 15
  • 33
  • Can you explain what the loop inside TFRecordWrite does? Thanks – Vaibhav Jha Sep 08 '20 at 11:17
  • What is there are mutiple labels? – Vaibhav Jha Sep 08 '20 at 11:28
  • The loop inside `TFRecordWriter` reads the file present in `full_path` in byte format, the label is taken from mapping in the code `(0 or 1)`. And they are passed into the `image_example`. This function return `tf.train.Example` object which is required for tfrecords. Can you elaborate more on what you mean by multiple labels? – Aniket Bote Sep 08 '20 at 11:40
  • So lets say I have images of jeans. Jeans number one is black and ripped, number 2 is blue and normal, number three is grey and patched. So i have a csv file of those labels. row number 1 would have jeans number one and there are columns for black, blue,grey,ripped,patched,normal. for jeans number one the columns for black and ripped would be set to 1 and others are 0. similarly for jeans number 2 columns of blue and normal will be set to 1 and other 0. so on for jeans number 3 and for thousands of jeans. – Vaibhav Jha Sep 08 '20 at 11:51
  • Does SequenceExample have any advantage over this? – Vaibhav Jha Sep 08 '20 at 13:00
  • Take a look at [this](https://stackoverflow.com/questions/45634450/what-are-the-advantages-of-using-tf-train-sequenceexample-over-tf-train-example). – Aniket Bote Sep 08 '20 at 13:07
  • Hey Aniket I had a few questions more. Can we connect on LinkedIn so I can clear my doubts. – Vaibhav Jha Sep 09 '20 at 05:39
  • If you have questions related to this answer you can ask them here or you can always ask a new questions on SO. – Aniket Bote Sep 09 '20 at 06:18
  • So do models accept the image data in some specific format or is this enough? Like Can i directly pass this or there will be some extra steps required. – Vaibhav Jha Sep 09 '20 at 08:00
  • Take a look at [this](https://medium.com/@moritzkrger/speeding-up-keras-with-tfrecord-datasets-5464f9836c36) for more information about how you can use it in model training. – Aniket Bote Sep 09 '20 at 08:24
  • So i have like 15 features(or labels) in one-hot encoded csv file so like in the image_example function do I just add 15 labels like you added 'label_color' and 'label_type'. Also what's the guarantee the model will interpret them as labels – Vaibhav Jha Sep 09 '20 at 11:45