11

I'm trying to train a custom dataset through tensorflow object detection api. Dataset contains 40k training images and labels which are in numpy ndarray format (uint8). training dataset shape=2 ([40000,23456]) and labels shape = 1 ([0..., 3]). I want to generate tfrecord for this dataset. how do I do that?

Govinda Malavipathirana
  • 1,095
  • 2
  • 11
  • 29
  • Answered here:https://stackoverflow.com/questions/45427637/is-there-a-more-simple-way-to-handle-batch-inputs-from-tfrecords/45428167#45428167 – Vijay Mariappan May 19 '18 at 04:14

1 Answers1

6

This tutorial will walk you through the process of creating TFRecords from your data:

https://medium.com/mostly-ai/tensorflow-records-what-they-are-and-how-to-use-them-c46bc4bbb564

However there are easier ways of dealing with preprocessing now using the Dataset input pipeline. I prefer to keep my data in it's most original format and build a preprocessing pipeline to deal with it. Here's the primary guide you want to read to learn about the Dataset preprocessing pipeline:

https://www.tensorflow.org/programmers_guide/datasets

David Parks
  • 30,789
  • 47
  • 185
  • 328
  • Reading the link, it's clear that TensorFlow wants you to load all your data into memory first (as a dataset). The link doesn't describe any way to load data any other way. Other documentation just says, 'whatever, go make a TFRecordDataset' – Monica Heddneck Nov 10 '18 at 02:01
  • I recommend following the second link, usign the Dataset pipeline. You will most certainly not be loading your entire dataset into memory. The amount of data loaded at one time will be governed by commands such as `batched_dataset = dataset.batch(4)`, see the section on Simple Batching. If you are providing a loader function then you'll start with a set of IDs (maybe load all the IDs) and you'll use `Dataset.map` to take an ID and return the actual data sample it refers to. If your data is already in a TF record format then TF will provide readers for you that load on demand. – David Parks Nov 10 '18 at 23:15
  • the top link has rotten. – Zuoanqh Jan 22 '19 at 22:42
  • So am I supposed to manually add every single column myself (400+) ? – Maaaaa Aug 05 '19 at 15:33
  • @Maaaaa I'm not clear what you're asking, this would probably be best asked as its own question with some small code example clarifying what your question is. You can reference this question in the new post. – David Parks Aug 05 '19 at 15:36