The Estimator.fit()
function takes as argument either
(
x
,y
, andbatch_size
) wherex
andy
could be numpy arrays or iterators.PROS
- Easy to use.
- Allows feeding data from arbitrary source as long as problem can be decomposed into
x
andy
.
CONS
- No provision to provide epoch
- In case,
x
andy
are arrays, the data aggregate must be available as opposed to reading on the fly ( say from database) - Whether array or iterator,
x
andy
can't be dictionaries. Most complex problems cannot be reduced to input matrix and output matrix and may require multiple input features matrices.
input_fn
- this is callback function which must returnfeatures
andtarget
tensors or dictionary of tensors.PROS
- Allows feeding data from arbitrary source (in theory).
- returned features and targets can be dictionary thus allowing to solve complex problems which takes multiple inputs.
CONS
- Only found support for reading files using
read_batch_examples()
,read_batch_features()
,read_batch_record_features()
, etc. - No support for passing placeholder and feed_fn to allow for arbitrary source of input data which don't require queue.
Relevant Discussion
- https://github.com/tensorflow/tensorflow/pull/4696#issuecomment-253632403
- How to use StreamingDataFeeder as contrib.learn.Estimator.fit()'s input_fn?
In discussion in 1., @martinwick suggested using py_func
to overcome to the CONS for input_fn however, I am still not sure how. Any suggestions, ideas, blueprint welcome.