1

With TensorFlow, it is easy to determine from examples that data contains numeric values. For example:

x_train = [1, 2, 3, 4]
y_train = [0, -1, -2, -3]

However, does it also work with string category values? For example:

x_train = ["sunny", "rainy", "sunny", "cloudy"]
y_train = ["go outside", "stay inside", "go outside", "go outside"]

If it does not, I must assume that TensorFlow has a methodology for working with categorical values. Perhaps by some clever trick such as converting them to numeric values in some systematic way.

mrry
  • 125,488
  • 26
  • 399
  • 400
sapbucket
  • 6,795
  • 15
  • 57
  • 94
  • Yes it can work with strings but it treats them as byte arrays i believe. It also has some methods for working with strings like [here](https://www.tensorflow.org/versions/r0.12/api_docs/python/string_ops/)Also look at [this](https://stackoverflow.com/questions/38902433/tensorflow-strings-what-they-are-and-how-to-work-with-them) – Zannith Dec 18 '17 at 18:09
  • I apologize for bad format, was doing something else and failed to fix edits in time. first link is to some tensorflow docs and second is a stack overflow question – Zannith Dec 18 '17 at 18:16
  • @Travis: interesting. Thank you for the links. So it does allow them; but it seems to convert to bytes and treats them as a scalar? I'm okay with the byte conversion but treating them as scalar makes very little sense. I'm still on the hunt to figure this out... – sapbucket Dec 18 '17 at 19:02
  • yes, in the world of machine learning the algorithms care very little for the human context of its data, Its all numerical inputs and outputs with numerical weights which allows for fast computation in the data flow graph. Think of neural networks, if I am remembering right that is the main thing tensorflow was designed for. (been a while since I read up on it). – Zannith Dec 20 '17 at 15:21

1 Answers1

1

Yes, TensorFlow does support datasets with categorical features. Perhaps the easiest way to work with them is to use the Feature Column API, which provides methods such as tf.feature_column.categorical_column_with_vocabulary_list() (for dealing with small, known sets of categories) and tf.feature_column.categorical_column_with_hash_bucket() (for dealing with large and potentially unbounded sets of categories).

Paul
  • 26,170
  • 12
  • 85
  • 119
mrry
  • 125,488
  • 26
  • 399
  • 400