I have a collection of csv files compressed in LZO format that I want to import into TensorFlow. However, if I try to read it as I would read uncompressed file, i.e., using
def parse_csv(line):
columns = tf.decode_csv(line, record_defaults=DEFAULTS, field_delim="\t", use_quote_delim=False) # take a line at a time
features = dict(zip(COLUMNS, columns)) # create a dictionary out of the features
labels = tf.to_int32(features.pop('label')) # define the label as an integer
return features, labels
data_files = glob.glob("my/folder/*")
dataset = tf.data.TextLineDataset(data_files)
dataset = dataset.map(parse_csv)
where DEFAULTS and COLUMNS have been defined before, I obtain the error
tensorflow.python.framework.errors_impl.InvalidArgumentError: Expect 20 fields but have 1 in record 0
To circumvent it, I tried both to define a tf.WholeFileReader
and to use the tf.read_file
function, then to pass their output to the decompress
function in the python-lzo
package, but to no avail. I suspect there are many errors there: at least one in the way I use the read_file
function, because I am not sure I navigate very well the TF data structures, and one in decompress
, because I don't really grasp how LZO works.
data_files = glob.glob("my/folder/*")
file_queue = tf.train.string_input_producer(data_files)
value = tf.read_file(file_queue.dequeue())
value = tf.map_fn(lzo.decompress, value)
dataset = tf.map_fn(parse_csv, value)
I obtain the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: slice index 0 of dimension 0 out of bounds. for 'map/strided_slice' (op: 'StridedSlice') with input shapes: [0], [1], [1], [1] and with computed input tensors: input[1] = <0>, input[2] = <1>, input[3] = <1>.
Could you point me out what is wrong and how could I solve it?