2

I have a numpy array that i want to write to a tfrecord file. The dimensions of the array for both input X and label y are [200,46,72,72]for the training of my model i want to read the tfrecord file to get slices of [72,72] both for input and label data.

I tried to apply the following stackoverflow answer

The problem is that this method is really slow probably due to the amount of elements looped over 200*46. When I write the entire numpy array as feature of bytes instead of floatlist I don't have this problem, but than I don't understand how to get [72,72] slices for each batch.

def npy_to_tfrecords(X,y):
    # write records to a tfrecords file
    output_file = 'E:\\Documents\\Datasets\\tfrecordtest\\test.tfrecord'
    writer = tf.python_io.TFRecordWriter(output_file)


    # Loop through all the features you want to write
    for i in range(X.shape[0]) :
         for j in range(X.shape[1]) :
            #let say X is of np.array([[...][...]])
            #let say y is of np.array[[0/1]]
            print(f"{i},{j}")
            # Feature contains a map of string to feature proto objects

            feature = {}
            feature['X'] = tf.train.Feature(float_list=tf.train.FloatList(value=X[i,j:,:].flatten()))
            feature['y'] = tf.train.Feature(float_list=tf.train.FloatList(value=y[i,j:,:].flatten()))

            # Construct the Example proto object
            example = tf.train.Example(features=tf.train.Features(feature= feature) )

            # Serialize the example to a string
            serialized = example.SerializeToString()

            # write the serialized objec to the disk
            writer.write(serialized)
    writer.close()

for reading i use the following code roughly

dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(_parse_function, num_parallel_calls=6)
dataset.apply(tf.contrib.data.shuffle_and_repeat(SHUFFLE_BUFFER))
dataset = dataset.batch(BATCH_SIZE)
iterator = dataset.make_one_shot_iterator()
input_data, label_data = iterator.get_next()

when i save the numpy arrays as bytes the parse_function returns the whole array and I can not figure out how to write a parse_function that returns slices.

Summary:

  • save 2 numpy arrays to tfrecord
  • read tfrecord file and obtain slices of the saved numpy arrays in the batches used for the model
Adam
  • 2,726
  • 1
  • 9
  • 22

0 Answers0