I have a numpy array that i want to write to a tfrecord file. The dimensions of the array for both input X
and label y
are [200,46,72,72]
for the training of my model i want to read the tfrecord file to get slices of [72,72]
both for input and label data.
I tried to apply the following stackoverflow answer
The problem is that this method is really slow probably due to the amount of elements looped over 200*46
.
When I write the entire numpy array
as feature of bytes
instead of floatlist
I don't have this problem, but than I don't understand how to get [72,72]
slices for each batch.
def npy_to_tfrecords(X,y):
# write records to a tfrecords file
output_file = 'E:\\Documents\\Datasets\\tfrecordtest\\test.tfrecord'
writer = tf.python_io.TFRecordWriter(output_file)
# Loop through all the features you want to write
for i in range(X.shape[0]) :
for j in range(X.shape[1]) :
#let say X is of np.array([[...][...]])
#let say y is of np.array[[0/1]]
print(f"{i},{j}")
# Feature contains a map of string to feature proto objects
feature = {}
feature['X'] = tf.train.Feature(float_list=tf.train.FloatList(value=X[i,j:,:].flatten()))
feature['y'] = tf.train.Feature(float_list=tf.train.FloatList(value=y[i,j:,:].flatten()))
# Construct the Example proto object
example = tf.train.Example(features=tf.train.Features(feature= feature) )
# Serialize the example to a string
serialized = example.SerializeToString()
# write the serialized objec to the disk
writer.write(serialized)
writer.close()
for reading i use the following code roughly
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(_parse_function, num_parallel_calls=6)
dataset.apply(tf.contrib.data.shuffle_and_repeat(SHUFFLE_BUFFER))
dataset = dataset.batch(BATCH_SIZE)
iterator = dataset.make_one_shot_iterator()
input_data, label_data = iterator.get_next()
when i save the
numpy arrays
asbytes
theparse_function
returns the whole array and I can not figure out how to write aparse_function
that returns slices.
Summary:
- save 2
numpy arrays
totfrecord
- read
tfrecord
file and obtainslices
of the savednumpy arrays
in the batches used for the model