0

EDIT: I'm using TensorFlow version 0.10.0rc0

I'm currently trying to use tf.contrib.learn.read_batch_examples working while using a TensorFlow (SKFlow/tf.contrib) Estimator, specifically the LinearClassifier. I create a read_batch_examples op feeding in a CSV file with a tf.decode_csv for the parse_fn parameter with appropriate default records. I then feed that op to my input_fn for fitting the Estimator, but when that's run I receive the following error:

ValueError: Tensor("centered_bias_weight:0", shape=(1,), dtype=float32_ref) must be from the same graph as Tensor("linear/linear/BiasAdd:0", shape=(?, 1), dtype=float32).

I'm confused because neither of those Tensors appear to be from the read_batch_examples op. The code works if I run the op beforehand and then feed the input instead as an array of values. While this workaround exists, it is unhelpful because I am working with large datasets in which I need to batch in my inputs. Currently going over Estimator.fit (currently equivalent to Estimator.partial_fit in iterations isn't nearly as fast as being able to feed in data as it trains, so having this working is ideal. Any ideas? I'll post the non-functioning code below.

def input_fn(examples_dict):
    continuous_cols = {k: tf.cast(examples_dict[k], dtype=tf.float32)
                       for k in CONTINUOUS_FEATURES}
    categorical_cols = {
    k: tf.SparseTensor(
        indices=[[i, 0] for i in xrange(examples_dict[k].get_shape()[0])],
        values=examples_dict[k],
        shape=[int(examples_dict[k].get_shape()[0]), 1])
    for k in CATEGORICAL_FEATURES}
    feature_cols = dict(continuous_cols)
    feature_cols.update(categorical_cols)
    label = tf.contrib.layers.one_hot_encoding(labels=examples_dict[LABEL],
                                               num_classes=2,
                                               on_value=1,
                                               off_value=0)
    return feature_cols, label

filenames = [...]
csv_headers = [...] # features and label headers
batch_size = 50
min_after_dequeue = int(num_examples * min_fraction_of_examples_in_queue)
queue_capacity = min_after_dequeue + 3 * batch_size
examples = tf.contrib.learn.read_batch_examples(
    filenames,
    batch_size=batch_size,
    reader=tf.TextLineReader,
    randomize_input=True,
    queue_capacity=queue_capacity,
    num_threads=1,
    read_batch_size=1,
    parse_fn=lambda x: tf.decode_csv(x, [tf.constant([''], dtype=tf.string) for _ in xrange(csv_headers)]))

examples_dict = {}
for i, header in enumerate(csv_headers):
    examples_dict[header] = examples[:, i]

categorical_cols = []
for header in CATEGORICAL_FEATURES:
    categorical_cols.append(tf.contrib.layers.sparse_column_with_keys(
        header,
        keys  # Keys for that particular feature, source not shown here
    ))
continuous_cols = []
for header in CONTINUOUS_FEATURES:
    continuous_cols.append(tf.contrib.layers.real_valued_column(header))
feature_columns = categorical_cols + continuous_cols

model = tf.contrib.learn.LinearClassifier(
            model_dir=model_dir,
            feature_columns=feature_columns,
            optimizer=optimizer,
            n_classes=num_classes)
# Above code is ok up to this point
model.fit(input_fn=lambda: input_fn(examples_dict),
          steps=200) # This line causes the error ****

Any alternatives for batching would be appreciated as well!

craymichael
  • 4,578
  • 1
  • 15
  • 24

1 Answers1

2

I was able to figure out my mistake through the help of the great TensorFlow team! read_batch_examples has to be called within input_fn, otherwise the op has to be run beforehand as it'll be from a different graph.

Edit

Here is the modified code that functions properly for those who are interested:

def input_fn(file_names, batch_size):
    examples_dict = read_csv_examples(file_names, batch_size)

    # Continuous features
    feature_cols = {k: tf.string_to_number(examples_dict[k], dtype=tf.float32)
                       for k in CONTINUOUS_FEATURES}
    # Categorical features
    feature_cols.update({
        k: tf.SparseTensor(
            indices=[[i, 0] for i in range(examples_dict[k].get_shape()[0])],
            values=examples_dict[k],
            shape=[int(examples_dict[k].get_shape()[0]), 1])
        for k in CATEGORICAL_FEATURES})

    # Change out type for classification/regression
    out_type = tf.int32 if CLASSIFICATION else tf.float32
    label = tf.string_to_number(examples_dict[LABEL], out_type=out_type)

    return feature_cols, label


def read_csv_examples(file_names, batch_size):

    def parse_fn(record):
        record_defaults = [tf.constant(['']), dtype=tf.string] * len(FEATURE_HEADERS)
        return tf.decode_csv(record, record_defaults)

    examples_op = tf.contrib.learn.read_batch_examples(
        file_names,
        batch_size=batch_size,
        reader=tf.TextLineReader,
        parse_fn=parse_fn)

    # Important: convert examples to dict for ease of use in `input_fn`
    # Map each header to its respective column (FEATURE_HEADERS order
    # matters!
    examples_dict_op = {}
    for i, header in enumerate(FEATURE_HEADERS):
        examples_dict_op[header] = examples_op[:, i]

    return examples_dict_op

This code is near minimal for producing a generic input function for your data. Also note that if you would like to pass num_epochs to read_batch_examples, you'll need to do something different for your categorical features (see this answer for details). Disclaimer: I wrote that answer. Hope this helps!

Community
  • 1
  • 1
craymichael
  • 4,578
  • 1
  • 15
  • 24
  • I've got the same problem with tensorflow on python3.5, but can't fix it by moving `read_batch_examples` inside `input_fn`. Can you please post the (full) corrected code? So I can see where my mistake it. – Gersee Nov 04 '16 at 15:30
  • @Gersee Sure, check out the updated answer for the updated code. Hope it helps! – craymichael Nov 04 '16 at 15:53
  • Thank you very much. This helps a lot. – Gersee Nov 05 '16 at 15:28