I have a csv file containing a large number N
of columns: the first column contains the label, the other N-1
a numeric representation of my data (Chroma features from a music recording).
My idea is to represent the input data as an array. In practice, I want an equivalent of the standard representation of data in computer vision. Since my data is stored in a csv, inside the definition of the input train function, I need to a csv parser. I do it in this way
def parse_csv(line):
columns = tf.decode_csv(line, record_defaults=DEFAULTS) # take a line at a time
features = {'songID': columns[0], 'x': columns[1:]} # create a dictionary out of the features
labels = features.pop('songID') # define the label
return features, labels
def train_input_fn(data_file=fp, batch_size=128):
"""Generate an input function for the Estimator."""
# Extract lines from input files using the Dataset API.
dataset = tf.data.TextLineDataset(data_file)
dataset = dataset.map(parse_csv)
dataset = dataset.shuffle(1_000_000).repeat().batch(batch_size)
return dataset.make_one_shot_iterator().get_next()
However, this returns an error that is not very significative: AttributeError: 'list' object has no attribute 'get_shape'
. I know that the culprit is the definition of x
inside the features dictionary, but I don't know how to correct it because, fundamentally, I don't really grok the data structures of tensorflow yet.