0

The Estimator.fit() function takes as argument either

  • (x, y, and batch_size) where x and y could be numpy arrays or iterators.

    PROS

    1. Easy to use.
    2. Allows feeding data from arbitrary source as long as problem can be decomposed into x and y.

    CONS

    1. No provision to provide epoch
    2. In case, x and y are arrays, the data aggregate must be available as opposed to reading on the fly ( say from database)
    3. Whether array or iterator, x and y can't be dictionaries. Most complex problems cannot be reduced to input matrix and output matrix and may require multiple input features matrices.
  • input_fn - this is callback function which must return features and target tensors or dictionary of tensors.

    PROS

    1. Allows feeding data from arbitrary source (in theory).
    2. returned features and targets can be dictionary thus allowing to solve complex problems which takes multiple inputs.

    CONS

    1. Only found support for reading files using read_batch_examples(), read_batch_features(), read_batch_record_features(), etc.
    2. No support for passing placeholder and feed_fn to allow for arbitrary source of input data which don't require queue.

Relevant Discussion

  1. https://github.com/tensorflow/tensorflow/pull/4696#issuecomment-253632403
  2. How to use StreamingDataFeeder as contrib.learn.Estimator.fit()'s input_fn?

In discussion in 1., @martinwick suggested using py_func to overcome to the CONS for input_fn however, I am still not sure how. Any suggestions, ideas, blueprint welcome.

Community
  • 1
  • 1
Abhi
  • 111
  • 1
  • 3

0 Answers0