10

I would like to manage my training with a tf.estimator.Estimator but have some trouble to use it alongside the tf.data API.

I have something like this:

def model_fn(features, labels, params, mode):
  # Defines model's ops.
  # Initializes with tf.train.Scaffold.
  # Returns an tf.estimator.EstimatorSpec.

def input_fn():
  dataset = tf.data.TextLineDataset("test.txt")
  # map, shuffle, padded_batch, etc.

  iterator = dataset.make_initializable_iterator()

  return iterator.get_next()

estimator = tf.estimator.Estimator(model_fn)
estimator.train(input_fn)

As I can't use a make_one_shot_iterator for my use case, my issue is that input_fn contains an iterator that should be initialized within model_fn (here, I use tf.train.Scaffold to initialize local ops).

Also, I understood that we can't only use input_fn = iterator.get_next otherwise the other ops will not be added to the same graph.

What is the recommended way to initialize the iterator?

guillaumekln
  • 504
  • 5
  • 17

1 Answers1

13

As of TensorFlow 1.5, it is possible to make input_fn return a tf.data.Dataset, e.g.:

def input_fn():
  dataset = tf.data.TextLineDataset("test.txt")
  # map, shuffle, padded_batch, etc.
  return dataset

See c294fcfd.


For previous versions, you can add the iterator's initializer in the tf.GraphKeys.TABLE_INITIALIZERS collections and rely on the default initializer.

tf.add_to_collection(tf.GraphKeys.TABLE_INITIALIZERS, iterator.initializer)
guillaumekln
  • 504
  • 5
  • 17
  • Thanks! +1. Just to clarify the answer: need to add the `tf.add_to_collection...` line before returning `input_fn()` and then it works fine and don't need to do anything with `Scaffold` and `local_init_ops`. – Pekka Dec 12 '17 at 12:16
  • Excuse me, is it possible to specify names for each field of the dataset using the first method? For example, my dataset has 2 fields: "age" and "sex", and I want to return a dictionary looks like: {"age": tensor1, "sex": tensor2}. – soloice Oct 09 '18 at 13:23
  • @Pekka @guillaumekln did you add the `tf.add_to_collection(...)` line within the `def input_fn()` or elsewhere within the `model_fn()`? If this was added in the `model_fn()` then would the line still be `tf.add_to_collection(tf.GraphKeys.TABLE_INITIALIZERS, iterator.initializer)` or would iterator.initializer need to be changed to something else? – reese0106 Oct 23 '18 at 15:55
  • You should add it in `input_fn()`, just after the creation of the iterator. – guillaumekln Oct 23 '18 at 16:33