21

When I read file with tf.read_file I get something with type tf.string. Documentation says only that it is "Variable length byte arrays. Each element of a Tensor is a byte array." (https://www.tensorflow.org/versions/r0.10/resources/dims_types.html). I have no idea how to interpret this.

I can do nothing with this type. In usual python you can get elements by index like my_string[:4], but when I run following code I get an error.

import tensorflow as tf
import numpy as np

x = tf.constant("This is string")
y = x[:4]


init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
result = sess.run(y)
print result

It says

  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_shape.py", line 621, in assert_has_rank
    raise ValueError("Shape %s must have rank %d" % (self, rank))
ValueError: Shape () must have rank 1

Also I cannot convert my string to tf.float32 tensor. It is .flo file and it has magic header "PIEH". This numpy code successfuly convert such header into number (see example here https://stackoverflow.com/a/28016469/4744283) but I can't do that with tensorflow. I tried tf.string_to_number(string, out_type=tf.float32) but it says

tensorflow.python.framework.errors.InvalidArgumentError: StringToNumberOp could not correctly convert string: PIEH

So, what string is? What it's shape is? How can I at least get part of the string? I suppose that if I can get part of it I can just skip "PIEH" part.

UPD: I forgot to say that tf.slice(string, [0], [4]) also doesn't work with same error.

Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
ckorzhik
  • 758
  • 2
  • 7
  • 21
  • BTW, you can get a list of ops that accept tf.string data types using this script: https://gist.github.com/yaroslavvb/16bb81fcfb0932169087add47ecb8c3a – Yaroslav Bulatov Aug 11 '16 at 17:50
  • Thanks for answers! Is this script for particular version of TF? It doesn't work for at least 0.9. Will try to update to 0.10. – ckorzhik Aug 12 '16 at 20:25
  • No, it doesn't work also for 0.10 ```$ python list_ops.py Traceback (most recent call last): File "list_ops.py", line 23, in if arg.type == tf.string: File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/dtypes.py", line 244, in __eq__ and self._type_enum == as_dtype(other).as_datatype_enum) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/dtypes.py", line 532, in as_dtype if key == type_value: TypeError: data type not understood ``` Can you help me to fix it? – ckorzhik Aug 12 '16 at 20:59
  • Seems like I have fixed it :) 23 line must be `if arg.type == tf.string.as_datatype_enum:` – ckorzhik Aug 12 '16 at 21:13
  • I suspect you are using 0.9 version or older of TensorFlow, there's no such line in dtypes.py and it hasn't been touched since 0.10 release – Yaroslav Bulatov Aug 12 '16 at 22:46

1 Answers1

20

Unlike Python, where a string can be treated as a list of characters for the purposes of slicing and such, TensorFlow's tf.strings are indivisible values. For instance, x below is a Tensor with shape (2,) whose each element is a variable length string.

x = tf.constant(["This is a string", "This is another string"])

However, to achieve what you want, TensorFlow provides the tf.decode_raw operator. It takes a tf.string tensor as input, but can decode the string into any other primitive data type. For instance, to interpret the string as a tensor of characters, you can do the following :

x = tf.constant("This is string")
x = tf.decode_raw(x, tf.uint8)
y = x[:4]
sess = tf.InteractiveSession()
print(y.eval())
# prints [ 84 104 105 115]
keveman
  • 8,427
  • 1
  • 38
  • 46