8

i'am still trying to run Tensorflow with own image data. I was able to create a .tfrecords file with the conevert_to() function from this example link

Now i i'd like to train the network with code from that example link.

But it fails in the read_and_decode() function. My changes in that function are:

label = tf.decode_raw(features['label'], tf.string) 

The Error is:

TypeError: DataType string for attr 'out_type' not in list of allowed values: float32, float64, int32, uint8, int16, int8, int64

So how to 1) read and 2) use string labels for training in tensorflow.

AlexanderSch
  • 157
  • 1
  • 3
  • 10

2 Answers2

6

The convert_to_records.py script creates a .tfrecords file in which each record is an Example protocol buffer. That protocol buffer supports string features using the bytes_list kind.

The tf.decode_raw op is used to parse binary strings into image data; it is not designed to parse string (textual) labels. Assuming that features['label'] is a tf.string tensor, you can use the tf.string_to_number op to convert it to a number. There is limited other support for string processing inside your TensorFlow program, so if you need to perform some more complicated function to convert the string label to an integer, you should perform this conversion in Python in the modified version of convert_to_tensor.py.

Raute
  • 63
  • 1
  • 8
mrry
  • 125,488
  • 26
  • 399
  • 400
  • 1
    Is `string_to_number` just for converting _numeric_ strings to numbers, though? I get an exception for arbitrary string values (i.e. `"test"`), whereas `tf.string_to_number("20")` works fine, and yields a `20.0` `tf.float32` tensor. – Ben Mosher Feb 15 '18 at 10:50
  • 1
    Yes. If you have textual string labels and need to convert them to a number, you could use one of the [`tf.feature_column.categorical_column_*()`] (https://www.tensorflow.org/api_docs/python/tf/feature_column) APIs, such as [`tf.feature_column.categorical_column_with_vocabulary_list()`](https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_list) or [`tf.feature_column.categorical_column_with_hash_bucket()`](https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_hash_bucket). – mrry Feb 16 '18 at 23:11
2

To add to @mrry 's answer, supposing your string is ascii, you can:

def _bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def write_proto(cls, filepath, ..., item_id): # itemid is an ascii encodable string
    # ...
    with tf.python_io.TFRecordWriter(filepath) as writer:
        example = tf.train.Example(features=tf.train.Features(feature={
             # write it as a bytes array, supposing your string is `ascii`
            'item_id': _bytes_feature(bytes(item_id, encoding='ascii')), # python 3
            # ...
        }))
        writer.write(example.SerializeToString())

Then:

def parse_single_example(cls, example_proto, graph=None):
    features_dict = tf.parse_single_example(example_proto,
        features={'item_id': tf.FixedLenFeature([], tf.string),
        # ...
        })
    # decode as uint8 aka bytes
    instance.item_id = tf.decode_raw(features_dict['item_id'], tf.uint8)

and then when you get it back in your session, transform back to string:

item_id, ... = session.run(your_tfrecords_iterator.get_next())
print(str(item_id.flatten(), 'ascii')) # python 3

I took the uint8 trick from this related so answer. Works for me but comments/improvements welcome.

Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
  • 1
    I have a TFRecord consisting of images, where one feature is the path to that image on the disk. The path is of the form `path\to\images\image432.jpg`. The length of this path varies from `88` to `91`. When I decode this particular feature as `tf.decode_raw(features['train/path'], tf.uint8)`, I get `ValueError: All shapes must be fully defined: [TensorShape([Dimension(None)]), TensorShape([Dimension(256), Dimension(256), Dimension(1)]), TensorShape([])]`, first dimension corresponds to the path – Effective_cellist Jan 23 '18 at 16:48
  • I'm having the same issues of all shapes needing to be fully defined. Also, I'd like to return the filename if an Assert statement is failed. Doesn't seem like this is possible. – Luke Aug 07 '18 at 16:27