I'm trying to build in some debugging code into by tensorflow dataset pipeline. Basically if tfrecord parsing fails on a certain file, I'd like to be able figure out which file that is. My dream would be to run a number of asserts in my parsing_function that provide the filename if they fail.
My pipeline looks something like this:
tf.data.Dataset.from_tensor_slices(file_list)
.apply(tf.contrib.data.parallel_interleave(lambda f: tf.data.TFRecordDataset(f), cycle_length=4))
.map(parse_func, num_parallel_calls=params.num_cores)
.map(_func_for_other_stuff)
Ideally I'd pass the filename through in the parallel_interleave step, but if I have the anonymous function return a filename, tfrecordataset tuple, I get:
TypeError: `map_func` must return a `Dataset` object.
I've also tried to include the filename in the file itself like this question, but am having issues here because filenames are of variable length.