Tensorflow load pre-trained model use different optimizer

Question

I want to load a pre-trained model (optimized by AdadeltaOptimizer) and continue training with SGD (GradientDescentOptimizer). The models are saved and loaded with tensorlayer API:

save model:

import tensorlayer as tl
tl.files.save_npz(network.all_params,
                  name=model_dir + "model-%d.npz" % global_step)

load model:

load_params = tl.files.load_npz(path=resume_dir + '/', name=model_name)
tl.files.assign_params(sess, load_params, network)

If I continue training with adadelta, the training loss (cross entropy) looks normal (start at a close value as the loaded model). However, if I change the optimizer to SGD, the training loss would be as large as a newly initialized model.

I took a look at the model-xxx.npz file from tl.files.save_npz. It only saves all model parameters as ndarray. I'm not sure how the optimizer or learning rate is involved here.

Joshua Lim · Accepted Answer · 2017-06-27T02:26:09.180

You probably would have to import the tensor into a variable which is the loss function/cross-entropy that feeds into your Adam Optimizer previously. Now, just feed it through your SGD optimizer instead.

saver = tf.train.import_meta_graph('filename.meta')
saver.restore(sess,tf.train.latest_checkpoint('./'))
graph = tf.get_default_graph()
cross_entropy = graph.get_tensor_by_name("entropy:0") #Tensor to import

optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

In this case, I have tagged the cross-entropy Tensor before training my pre-train model with the name entropy, as such

tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv), name = 'entropy')

If you are unable to make changes to your pretrain model, you can obtain the list of Tensors in your model(after you have imported it) from graph and deduce which Tensor you require. I have no experience with Tensorlayer, so this guide is to provide more of an understanding. You can take a look at Tensorlayer-Layers, they should explain how to obtain your Tensor. As Tensorlayer is built on top of Tensorflow, most of the functions should still be available.

Thanks for the answer. But I'm confused why I have to import the tensor "which computes the loss previously on your Adam Optimizer". The loss is supposed to be the same given the same model parameters regardless of optimizer, right? — Irene W., Jun 23 '17 at 17:43
I have made changes in my answer, hope this is clearer. So effectively, I am just changing the type of optimizer from Adam to SGD. — Joshua Lim, Jun 27 '17 at 02:27

score 1 · Answer 2 · answered Jun 23 '17 at 07:18

You can specify the parameters you want to save in your checkpoint file.

save_npz([save_list, name, sess])

In the save_list you're specifying only the network parameters that don't contain the optimizer parameters, thus no learning rate or any other optimizer parameters.

If you want to save the current learning rate (in order to use the same exact learning rate when you restore the model) you have to add it to the save_list, like that:

save_npz(network.all_params.extend([learning_rate])

(I suppoose that all_params is an array, I guess my supposition is correct.

Since you want to change the optimizer, I suggest you save the learning_rate only as optimizer parameter and not any other variable that the optimizer creates. In that way, you'll be able to change the optimizer and restoring the model, otherwise (if you put in your checkpoint any other variable) the graph you'll try to restore won't find the variables in which place the saved value and you won't be able to change it.

In face I only saved `network.all_params`, it doesn't contain any other variable that the adadelta creates, right? I didn't save learning rate either, but assign initial learning rate for SGD after restoring the model. — Irene W., Jun 23 '17 at 17:37

score 0 · Answer 3 · answered Aug 08 '19 at 03:19

https://tensorlayer.readthedocs.io/en/latest/user/get_start_advance.html#pre-trained-cnn

vgg = tl.models.vgg16(pretrained=True)
img = tl.vis.read_image('data/tiger.jpeg')
img = tl.prepro.imresize(img, (224, 224)).astype(np.float32) / 255
output = vgg(img, is_train=False)

For 2.0 version, use this

Tensorflow load pre-trained model use different optimizer

3 Answers3

Linked