I successfully train my https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census model/experiment both locally and in the cloud. And I'm able to deploy my sample and run predictions in the cloud.
But if I wanted to run my predictions locally - not in the cloud - how would I go about that?
I'm a novice, but I have tried a couple of naive approaches, all failing, see below for 3 specific ones.
Any hint or reference to snippets are welcome.
:-)
M.
** update regarding approach #1 in the original post**
If I include the single line;
c = tf.contrib.learn.DNNLinearCombinedClassifier(model_dir=job_dir)
I get an error, see error #a below.
If I naively edit the call to include the missing parameter the constructor works, but if I call predict that fails with error #b, see below. I make wide_columns and deep_columns in model.py global and modify the above line to be
c = tf.contrib.learn.DNNLinearCombinedClassifier(model_dir=job_dir, linear_feature_columns=model.wide_columns, dnn_feature_columns=model.deep_columns)
My pycharm debugger confirms the model.wide_columns and model.deep_columns are instantiated/not empty at the time of the call.
Now this leads to an "empty" classifier. I do not believe the DNNLinearCombinedClassifier picks up any model content from my job_dir. I would've included screenshots from inspecting the classifier whilst being instantiated in model.py build_estimator() (I made it into a variable c there as well, and had a breakpoint) and from the above c in task.py, but I'm not allowed by github due to my lack of reputation. But the difference is obvious - e.g. the c->params->dnn_hidden_units is empty for the restored classifier, but instantiated ([100,70,48,34]) with the original classifier.
I include an ls -R for the job_dir (called output), see #c below.
And I do rm -rf output for each run so the job_dir is clean.
Clearly I err somewhere, but in my lack of insight I'm unable to see where. Any further advice is appreciated.
:-)
M.
----------------------- console output (update) --------------------------
a.
Starting Census: Please lauch tensorboard to see results:
tensorboard --logdir=$MODEL_DIR
2017-05-30 12:14:10.570030: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-30 12:14:10.570042: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-30 12:14:10.570046: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Traceback (most recent call last):
File "<..>/trainer/task.py", line 199, in <module>
c = tf.contrib.learn.DNNLinearCombinedClassifier(model_dir=job_dir)
File "<..>/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 335, in new_func
return func(*args, **kwargs)
File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py", line 597, in __init__
raise ValueError("Either linear_feature_columns or dnn_feature_columns "
ValueError: Either linear_feature_columns or dnn_feature_columns must be defined.
Process finished with exit code 1
b.
Starting Census: Please lauch tensorboard to see results:
tensorboard --logdir=$MODEL_DIR
2017-05-30 12:31:47.967638: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-30 12:31:47.967650: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-30 12:31:47.967653: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Traceback (most recent call last):
File "<..>/repository/git/13cx/subject-matter/google-cloud/1705cloudml/170530local-save/trainer/task.py", line 206, in <module>
p = c.predict(input_fn=eval2_input_fn)
File "<..>/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 335, in new_func
return func(*args, **kwargs)
File "<..>/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 335, in new_func
return func(*args, **kwargs)
File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py", line 660, in predict
as_iterable=as_iterable)
File "<..>/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 335, in new_func
return func(*args, **kwargs)
File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py", line 695, in predict_classes
as_iterable=as_iterable)
File "<..>/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 281, in new_func
return func(*args, **kwargs)
File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 565, in predict
as_iterable=as_iterable)
File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 857, in _infer_model
infer_ops = self._get_predict_ops(features)
File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1188, in _get_predict_ops
return self._call_model_fn(features, labels, model_fn_lib.ModeKeys.INFER)
File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1103, in _call_model_fn
model_fn_results = self._model_fn(features, labels, **kwargs)
File "<..>/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py", line 201, in _dnn_linear_combined_model_fn
"dnn_hidden_units must be defined when dnn_feature_columns is "
ValueError: dnn_hidden_units must be defined when dnn_feature_columns is specified.
Process finished with exit code 1
c.
$ ls -R output/
output/:
checkpoint graph.pbtxt model.ckpt-2.data-00000-of-00001
eval model.ckpt-1000.data-00000-of-00001 model.ckpt-2.index
events.out.tfevents.1496140978.yarc-mainlinux model.ckpt-1000.index model.ckpt-2.meta
export model.ckpt-1000.meta
output/eval:
events.out.tfevents.1496140982.yarc-mainlinux events.out.tfevents.1496140987.yarc-mainlinux
output/export:
Servo
output/export/Servo:
1496140989
output/export/Servo/1496140989:
saved_model.pb variables
output/export/Servo/1496140989/variables:
variables.data-00000-of-00001 variables.index
----------** original post **----------
-------- things I've tried ------------
See at the bottom for code with references to 1, 2, 3..
Reinstantiate the DNNLinearCombinedClassifier with a model_dir parameter pointing to where the model is stored. The plan was to run classifier's predict method. I'm not able to make the classifier reflect the saved model.
Restore the model through saver.restore(). This works, but I don't understand how to proceed from there. Due to lack of tensorflow insight I guess.
Produce some tests data for use with method 1. Evaluation of the tensors never exit. How do I evaluate an input batch, so that I may see it as a matrix?
--------- accompanying code -----------------
(this code is simply appended to the end of trainer/task.py)
# last original line from task.py:
learn_runner.run(generate_experiment_fn(**arguments), job_dir)
# my stuff:
# 1. restore the classifier from model dir, fails
# c = tf.contrib.learn.DNNLinearCombinedClassifier(model_dir=job_dir)
# 2. restore model, works ok, but then how?
sess = tf.Session()
saver = tf.train.import_meta_graph('output/model.ckpt-1000.meta')
saver.restore(sess, tf.train.latest_checkpoint('./output/'))
sess.run(tf.global_variables_initializer())
print("Sanity check, a variable instance {}".format(
sess.run('dnn/input_from_feature_columns/education_embedding/weights/part_0:0')))
sess.close()
# 3. produce some test input (we're for simplicity reusing the eval set), apparently works, but an evaluation hangs forever
eval2_input_fn = model.generate_input_fn(
arguments['eval_files'],
batch_size=arguments['eval_batch_size'],
shuffle=False
)
# 3a. inspecting some input, the evaluation never ends.
input = eval2_input_fn()
print("input: {}".format(input))
with tf.Session() as sess:
evalinput = input[1].eval()
print("evalinput: {}".format(evalinput))
print("\nDone")