11

How can I create a tensorflow record from a list?

From the documentation here it seems possible. There's also this example where they convert a numpy array into a byte array using the .tostring() from numpy. However when I try to pass in:

labels = np.asarray([[1,2,3],[4,5,6]])
...
example = tf.train.Example(features=tf.train.Features(feature={
    'height': _int64_feature(rows),
    'width': _int64_feature(cols),
    'depth': _int64_feature(depth),
    'label': _int64_feature(labels[index]),
    'image_raw': _bytes_feature(image_raw)}))
writer.write(example.SerializeToString())

I get the error:

TypeError: array([1, 2, 3]) has type type 'numpy.ndarray', but expected one of: (type 'int', type 'long')

Which doesn't help me to figure out how to store a list of integers into the tfrecord. I've tried looking through the docs.

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
Steven
  • 5,134
  • 2
  • 27
  • 38

4 Answers4

22

After a while of messing around with it and looking further in the documentation I found my own answer. In the above function using the example code as a base:

def _int64_feature(value):
  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
...
'label': _int64_feature(labels[index]),

labels[index] is being cast to a list as [value] so you have [np.array([1,2,3])] which causes the error.

The above cast was necessary in the example because tf.train.Int64List() expects either a list or numpy array and the example was passing in a single integer so they typecasted it to a list as so.
In the example it was like this

label = [1,2,3,4]
...
'label': _int64_feature(label[index]) 

tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
#Where value = [1] in this case

If you want to pass in a list do this

labels = np.asarray([[1,2,3],[4,5,6]])
...
def _int64_feature(value):
  return tf.train.Feature(int64_list=tf.train.Int64List(value=value))
...
'label': _int64_feature(labels[index]),

I'll probably do a pull request because I found the original documentation for tf.train.Feature to be almost non-existent.

TL;DR

Pass either a list or numpy array to tf.train.Int64List() but not a list of lists or list of numpy arrays.

Steven
  • 5,134
  • 2
  • 27
  • 38
  • 1
    what about extracting the int back out to numpy? – Andrew Hundt Jun 16 '17 at 14:01
  • That's a separate question that should be asked elsewhere. I do not know the internal representation well enough of tfrecords to be able to answer how to convert it back. The easiest way to recover the data given a tfrecord is to run it through a tensorflow graph where the only operation is tf.identity(tfrecord). Then you can extract the contents. The internal representations might change over time so I think this is the most future proof way to do it. – Steven Jun 16 '17 at 15:35
  • To a TF int tensor: `height = tf.cast(features['height'], tf.int64)` To numpy arrays: https://stackoverflow.com/a/36026969/99379. You're right but it was so closely related. :-) Thanks! – Andrew Hundt Jun 16 '17 at 16:25
3

As per my understanding you want to store a list of integers in the tfrecord. It is possible to store oneof packed BytesList,FloatList, or Int64List as per the documentation https://github.com/tensorflow/tensorflow/blob/r0.9/tensorflow/core/example/example.proto

If you look at the example they are using a function _int64_feature in which they are creating a list of value passed to the function

    def _int64_feature(value):
      return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

In your case you are trying to pass the list as value to the function _int64_feature so it's giving an error.

so use this instead which will resolve your error for storing the list of int values or modify the above function according to your need.

'label': tf.train.Feature(int64_list=tf.train.Int64List(value=labels[index]))

Hope this is helpful

Aravind Pilla
  • 416
  • 4
  • 14
2

Int64List, BytesList and FloatList expect an iterator of the underlying elements (repeated field). In the case of your function _int64_feature you use a list as an iterator.

When you pass a scalar, your _int64_feature creates an array of one int64 element in it (exactly as expected). But when you pass an ndarray you create a list of one ndarray and pass it to a function which expects a list of int64.

So just remove construction of the array from your function: int64_list=tf.train.Int64List(value=value)

Community
  • 1
  • 1
Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
1

One is to correct the value = [value] into value = value, but if you want pass a list of lists or a list of numpy.arrays, which is a very common situation if you want to save x, y, z coordinates for all atoms of one molecule, actually you can first flat you arrays, and then use value = value. For example,

    array_1 = np.array([[1,2,3],[2,3,4]]).ravel()

and if you want to put it back when you read tfrecord file or training, you can just use reshape

    array_1 = array_1.reshape([2,3])
Jianing Lu
  • 11
  • 1