What is the difference between a checkpoint and an exported best model?
A checkpoint is, at its minimum, a file containing the values of all the variables of a specific graph taken at a specific time point.
By specific graph I mean that when loading back your checkpoint, what TensorFlow does is loop through all the variables defined in your graph (the one in the session
you're running) and search for a variable in the checkpoint file that has the same name as the one in the graph. For resuming training, this is ideal because your graph will always look the same between restarts.
An exported model serves a different purpose. The idea of an exported model is that, once you're done training, you want to get something you can use for inference that doesn't contain all the (heavy) parts that are specific to training (some examples: gradient computation, global step variable, input pipeline, ...).
Moreover, and his is the key point, typically the way you provide input to the inference model is different than the one you use for the training. For training, you have an input pipeline that loads, preprocess and feeds data to your network. This input pipeline is not part of the model itself and may have to be altered for inference. This is a key point when operating with Estimator
s.
Why do I need a serving input receiver function?
To answer this I'll take first a step back. Why do we need input functions at all ad what are they? TF's Estimator
s, while perhaps not as intuitive as other ways to model networks, have a great advantage: they clearly separate between model logic and input processing logic by means of input functions and model functions.
A model lives in 3 different phases: Training, Evaluation and Inference. For the most common use-cases (or at least, all I can think of at the moment), the graph running in TF will be different in all these phases. The graph is the combination of input preprocessing, model and all the machinery necessary to run the model in the current phase.
A few examples to hopefully clarify further: When training, you need gradients to update the weights, an optimizer that runs the training step, metrics of all kinds to monitor how things are going, an input pipeline that grabs data from the training set, etc. When evaluating, you don't need the gradients and you need a different input function. When you are inferencing, all you need is the forward part of the model and again the input function will be different (no tf.data.*
stuff but typically just a placeholder).
Each of these phases in Estimator
s has its own input function. You're familiar with the training and evaluation ones, the inference one is simply your serving input receiver
function. In TF lingo, "serving" is the process of packing a trained model and using it for inference (there's a whole TensorFlow serving system for large-scale operation but that's beyond this question and you most likely won't need it anyhow).
Time to quote a TF guide on the topic:
During training, an input_fn() ingests data and prepares it for use by
the model. At serving time, similarly, a serving_input_receiver_fn()
accepts inference requests and prepares them for the model. This
function has the following purposes:
- To add placeholders to the graph that the serving system will feed
with inference requests.
- To add any additional ops needed to convert
data from the input format into the feature Tensors expected by the
model.
Now, the serving input function specification depends on how you plan of sending input to your graph.
If you're going to pack the data in a (serialized) tf.Example
(which is similar to one of the records in your TFRecord files), your serving input function will have a string placeholder (that's for the serialized bytes for the example) and will need a specification of how to interpret the example in order to extract its data. If this is the way you want to go I invite you to have a look at the example in the linked guide above, it essentially shows how you setup the specification of how to interpret the example and parse it to obtain the input data.
If, instead, you're planning on directly feeding input to the first layer of your network you still need to define a serving input function, but this time it will only contain a placeholder that will be plugged directly into the network. TF offers a function that does just that: tf.estimator.export.build_raw_serving_input_receiver_fn
.
So, do you actually need to write your own input function? IF al you need is a placeholder, no. Just use build_raw_serving_input_receiver_fn
with the appropriate parameters. IF you need fancier preprocessing, then yes, you might need to write your own. In that case, it would look something like this:
def serving_input_receiver_fn():
"""For the sake of the example, let's assume your input to the network will be a 28x28 grayscale image that you'll then preprocess as needed"""
input_images = tf.placeholder(dtype=tf.uint8,
shape=[None, 28, 28, 1],
name='input_images')
# here you do all the operations you need on the images before they can be fed to the net (e.g., normalizing, reshaping, etc). Let's assume "images" is the resulting tensor.
features = {'input_data' : images} # this is the dict that is then passed as "features" parameter to your model_fn
receiver_tensors = {'input_data': input_images} # As far as I understand this is needed to map the input to a name you can retrieve later
return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)
How can I train my estimator, save the best model, and then later load it?
Your model_fn
takes the mode
parameter in order for you to build conditionally the model. In your colab, you always have a optimizer, for example. This is wrong ,as it should only be there for mode == tf.estimator.ModeKeys.TRAIN
.
Secondly, your build_fn
has an "outputs" parameter that is meaningless. This function should represent your inference graph, take as input only the tensors you'll fed to it in the inference and return the logits/predictions.
I'll thus assume the outputs
parameters is not there as the build_fn
signature should be def build_fn(inputs, params)
.
Moreover, you define your model_fn
to take features
as a tensor. While this can be done, it both limits you to having exactly one input and complicates things for the serving_fn (you can't use the canned build_raw_...
but need to write your own and return a TensorServingInputReceiver
instead). I'll choose the more generic solution and assume your model_fn
is as follows (I omit the variable scope for brevity, add it as necessary):
def model_fn(features, labels, mode, params):
my_input = features["input_data"]
my_input.set_shape(I_SHAPE(params['batch_size']))
# output of the network
onet = build_fn(features, params)
predicted_labels = tf.nn.sigmoid(onet)
predictions = {'labels': predicted_labels, 'logits': onet}
export_outputs = { # see EstimatorSpec's docs to understand what this is and why it's necessary.
'labels': tf.estimator.export.PredictOutput(predicted_labels),
'logits': tf.estimator.export.PredictOutput(onet)
}
# NOTE: export_outputs can also be used to save models as "SavedModel"s during evaluation.
# HERE is where the common part of the graph between training, inference and evaluation stops.
if mode == tf.estimator.ModeKeys.PREDICT:
# return early and avoid adding the rest of the graph that has nothing to do with inference.
return tf.estimator.EstimatorSpec(mode=mode,
predictions=predictions,
export_outputs=export_outputs)
labels.set_shape(O_SHAPE(params['batch_size']))
# calculate loss
loss = loss_fn(onet, labels)
# add optimizer only if we're training
if mode == tf.estimator.ModeKeys.TRAIN:
optimizer = tf.train.AdagradOptimizer(learning_rate=params['learning_rate'])
# some metrics used both in training and eval
mae = tf.metrics.mean_absolute_error(labels=labels, predictions=predicted_labels, name='mea_op')
mse = tf.metrics.mean_squared_error(labels=labels, predictions=predicted_labels, name='mse_op')
metrics = {'mae': mae, 'mse': mse}
tf.summary.scalar('mae', mae[1])
tf.summary.scalar('mse', mse[1])
if mode == tf.estimator.ModeKeys.EVAL:
return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=metrics, predictions=predictions, export_outputs=export_outputs)
if mode == tf.estimator.ModeKeys.TRAIN:
train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op, eval_metric_ops=metrics, predictions=predictions, export_outputs=export_outputs)
Now, to set up the exporting part, after your call to train_and_evaluate
finished:
1) Define your serving input function:
serving_fn = tf.estimator.export.build_raw_serving_input_receiver_fn(
{'input_data':tf.placeholder(tf.float32, [None,#YOUR_INPUT_SHAPE_HERE (without batch size)#])})
2) Export the model to some folder
est.export_savedmodel('my_directory_for_saved_models', serving_fn)
This will save the current state of the estimator to wherever you specified. If you want a specifc checkpoint, load it before calling export_savedmodel
.
This will save in "my_directory_for_saved_models" a prediction graph with the trained parameters that the estimator had when you called the export function.
Finally, you might want t freeze the graph (look up freeze_graph.py
) and optimize it for inference (look up optimize_for_inference.py
and/or transform_graph
) obtaining a frozen *.pb
file you can then load and use for inference as you wish.
Edit: Adding answers to the new questions in the update
Sidenote:
My “goal” (motivating me to ask this question) is to try and build a
reusable framework for training networks so I can just pass a
different build_fn and go (plus have the quality of life features of
exported model, early stopping, etc).
By all means, if you manage, please post it on GitHub somewhere and link it to me. I've been trying to get just the same thing up and running for a while now and the results are not quite as good as I'd like them to be.
Question 1:
In other words, it is my understanding (perhaps wrongly) that the
point of a tf Example / SequenceExample is to store a complete
singular datum entity ready to go - no other processing needed other
than reading from the TFRecord file.
Actually, this is typically not the case (although, your way is in theory perfectly fine too).
You can see TFRecords as a (awfully documented) way to store a dataset in a compact way. For image datasets for example, a record typically contains the compressed image data (as in, the bytes composing a jpeg/png file), its label and some meta information. Then the input pipeline reads a record, decodes it, preprocesses it as needed and feeds it to the network. Of course, you can move the decoding and preprocessing before the generation of the TFRecord dataset and store in the examples the ready-to-feed data, but the size blowup of your dataset will be huge.
The specific preprocessing pipeline is one example what changes between phases (for example, you might have data augmentation in the training pipeline, but not in the others). Of course, there are cases in which these pipelines are the same, but in general this is not true.
About the aside:
“When evaluating, you don't need the gradients and you need a
different input function. “, the only difference (at least in my case)
is the files from which you reading?
In your case that may be. But again, assume you're using data augmentation: You need to disable it (or, better, don't have it at all) during eval and this alters your pipeline.
Question 2: What if I train my model with records and want to inference with just the dense tensors?
This is precisely why you separate the pipeline from the model.
The model takes as input a tensor and operates on it. Whether that tensor is a placeholder or is the output of a subgraph that converts it from an Example to a tensor, that's a detail that belongs to the framework, not to the model itself.
The splitting point is the model input. The model expects a tensor (or, in the more generic case, a dict of name:tensor
items) as input and uses that to build its computation graph. Where that input comes from is decided by the input functions, but as long as the output of all input functions has the same interface, one can swap inputs as needed and the model will simply take whatever it gets and use it.
So, to recap, assuming you train/eval with Examples and predict with dense tensors, your train and eval input functions will set up a pipeline that reads examples from somewhere, decodes them into tensors and returns those to the model to use as inputs. Your predict input function, on the other hand, just sets up one placeholder per input of your model and returns them to the model, because it assumes you'll put in the placeholders the data ready to be fed to the network.
Question 3:
You pass the placeholder as a parameter of build_raw_serving_input_receiver_fn
, so you choose its name:
tf.estimator.export.build_raw_serving_input_receiver_fn(
{'images':tf.placeholder(tf.float32, [None,28,28,1], name='input_images')})
Question 4:
There was a mistake in the code (I had mixed up two lines), the dict's key should have been input_data
(I amended the code above).
The key in the dict has to be the key you use to retrieve the tensor from features
in your model_fn
. In model_fn
the first line is:
my_input = features["input_data"]
hence the key is 'input_data'
.
As per the key in receiver_tensor
, I'm still not quite sure what role that one has, so my suggestion is try setting a different name than the key in features
and check where the name shows up.
Question 5:
I'm not sure I understand, I'll edit this after some clarification