0

I am currently working on a project using audio data. The first step of the project is to use another model to produce features for the audio example that are about [400 x 10_000] for each wav file and each wav file will have a label that I'm trying to predict. I will then build another model on top of this to produce my final result.

I don't want to run preprocessing every time I run the model, so my plan was to have a preprocessing pipeline that runs the feature extraction model and saves it into a new folder and then I can just have the second model use the saved features directly. I was looking at using TFRecords, but the documentation is quite unhelpful.

tf.io.serialize_tensor tfrecord

This is what I've come up with to test it so far:

serialized_features = tf.io.serialize_tensor(features)

feature_of_bytes = tf.train.Feature(
    bytes_list=tf.train.BytesList(value=[serialized_features.numpy()]))

features_for_example = {
    'feature0': feature_of_bytes
}
example_proto = tf.train.Example(
    features=tf.train.Features(feature=features_for_example))

filename = 'test.tfrecord'
writer = tf.io.TFRecordWriter(filename)

writer.write(example_proto.SerializeToString())

filenames = [filename]
raw_dataset = tf.data.TFRecordDataset(filenames)

for raw_record in raw_dataset.take(1):
    example = tf.train.Example()
    example.ParseFromString(raw_record.numpy())
    print(example)

But I'm getting this error:

tensorflow.python.framework.errors_impl.DataLossError: truncated record at 0' failed with Read less bytes than requested 

tl;dr:

Getting the above error with TFRecords. Any recommendations to get this example working or another solution not using TFRecords?

Bryn
  • 1
  • 1

0 Answers0