0

After training my model I save itand later try to restore it to calculate cost/accuracy on dev set.

Right before restoring, I run the following statement to figure out my variables.

from tensorflow.python.tools import inspect_checkpoint as chkp

chkp.print_tensors_in_checkpoint_file("./trained_models/my_nn_model.ckpt", tensor_name='', 
                                      all_tensors=True, all_tensor_names=True)

And I see the following output:

tensor_name:  biases/b1
[[ 0.4088161 ]
 [ 0.73051345]
 [ 0.861546  ]
 [-0.01601586]]
tensor_name:  biases/b1/Adam
[[ 0.06940479]
 [-0.01317821]
 [ 0.00601695]
 [ 0.0169837 ]]
tensor_name:  biases/b1/Adam_1
[[0.00422197]
 [0.00048599]
 [0.00077043]
 [0.00035076]]
tensor_name:  biases/b2
[[ 0.80142576]
 [-0.09536028]
 [ 0.31366938]]
tensor_name:  biases/b2/Adam
[[ 0.08435135]
 [ 0.03394406]
 [-0.04104255]]
tensor_name:  biases/b2/Adam_1
[[0.00650834]
 [0.00206493]
 [0.00083752]]
tensor_name:  biases/b3
[[-0.6808493 ]
 [ 0.42616928]]
tensor_name:  biases/b3/Adam
[[ 0.11350942]
 [-0.11350942]]
tensor_name:  biases/b3/Adam_1
[[0.00629836]
 [0.00629836]]
tensor_name:  train/beta1_power
0.004638391
tensor_name:  train/beta2_power
0.9502551
tensor_name:  weights/W1
[[ 0.35077223  0.30753523  0.19711483 -0.5701605   0.22447775]
 [-0.7757121  -0.20513503  0.4545326  -0.14088248  0.4854558 ]
 [-0.66474247  0.28792825  0.06203659 -0.0888676  -0.74835175]
 [-0.41984704 -0.5626613  -0.02844676  0.77327466  0.19199598]]
tensor_name:  weights/W1/Adam
[[ 0.13355881  0.4353028   0.4103592   0.14981574  0.27531895]
 [ 0.01698016 -0.07343768 -0.11361112 -0.04086655 -0.07324728]
 [-0.00324349  0.02257502  0.04864099  0.02607765  0.0225742 ]
 [ 0.11069385  0.09307133  0.06229053  0.07731174  0.08953418]]
tensor_name:  weights/W1/Adam_1
[[0.06442691 0.11718791 0.16552295 0.10027011 0.11132942]
 [0.00597157 0.01351114 0.01625086 0.0113084  0.01210043]
 [0.0034455  0.0109939  0.04340019 0.02456977 0.01193165]
 [0.010284   0.01212158 0.01438992 0.01114361 0.01298358]]
tensor_name:  weights/W2
[[ 0.6157185  -0.02184171  0.5163279  -0.3498895 ]
 [-0.15082173  0.21863511 -0.21755247  0.39887637]
 [-0.5565993   0.65659076 -0.6370119   0.41734824]]
tensor_name:  weights/W2/Adam
[[ 0.39385152  0.27537686  0.01230302 -0.05157183]
 [ 0.08531421  0.15998691  0.00756624  0.01899205]
 [-0.11271227 -0.18292099 -0.00443625 -0.0315922 ]]
tensor_name:  weights/W2/Adam_1
[[0.11990622 0.17129508 0.00665622 0.0358038 ]
 [0.03782089 0.06448739 0.00252486 0.01346588]
 [0.00787948 0.01284081 0.00035877 0.00662182]]
tensor_name:  weights/W3
[[ 0.5939301   0.605848   -0.59496546]
 [-0.23180145  0.17120583  0.04733036]]
tensor_name:  weights/W3/Adam
[[ 0.40406024  0.07094829  0.11723397]
 [-0.40406027 -0.07094829 -0.11723398]]
tensor_name:  weights/W3/Adam_1
[[0.11013244 0.03589008 0.01292834]
 [0.11013244 0.03589008 0.01292834]]

I expect to see biases/b1, weights/W1 etc.

But I do not want to see biases/b1/Adam, biases/b1/Adam_1, etc.

The tensorflow documentation say the following: "Estimators automatically saves and restores variables (in the model_dir)." As I am using the AdamOptimizer in my model, I assume these extra variables I see above (biases/b1/Adam, etc.) are related to this statement.

But it is quite confusing.

  • Which b1 variable is my final variable after training my model then? For example, is it biases/b1, biases/b1/Adam, or biases/b1/Adam_1?

  • Seemingly these biases/b1/Adam.. variables are not appreciated my program and when I restore my model, I get a run time error saying "cannot add op with name weights/W1/Adam as that name is already used". How am I actually supposed to solve this problem?

edn
  • 1,981
  • 3
  • 26
  • 56

1 Answers1

0

You final variable is 'biases/b1'. The other 2 are variables created by Adam optimizer that you use during training to maintain estimates of the first and second order derivatives of the gradients. This variables might still be valuable to you if you would like to continue training after restoring the model.

This answer shows, that you can save only trainable variables with the next line of code:

saver=tf.train.Saver(var_list=tf.trainable_variables())

If you introduce new variables, different from once saved in checkpoint you need to initialize them manually, since Saver will not help you there.

y.selivonchyk
  • 8,987
  • 8
  • 54
  • 77
  • Thank you for your answer. After implementing your tips, it solved the problem and I do not see Adam related variables anymore when I print out my model variables. But I am still getting the same error: "cannot add op with name weights/W1/Adam as that name is already used" I believe the critical part of the whole code is the following: saver = tf.train.import_meta_graph('./trained_models/my_nn_model.ckpt.meta') saver.restore(sess_dev, './trained_models/my_nn_model.ckpt') saver.restore(...) is the one giving this error. Would you happen to have any comments? I am a bit stuck here – edn Mar 27 '18 at 17:42
  • if you are doing tf.train.import_meta_graph you should not build any other operations neither before nor after that. It seems that you define some operations and build and optimizer and restore the graph, which already contains the ops. long story short, while using tf.train.import_meta_graph store all variables in the checkpoint as before and dont constuct any ops (i.e. add another .py file where you restore the model) and query existing operations by names – y.selivonchyk Mar 27 '18 at 18:00
  • Let me see if I get it right now. What I was doing is I was first training my model in a session. After that session is closed, I was calling a function (which is in the same Jupyter notebook) to evaluate my model on the dev set where I tried to restore the model in a new session. And it was failing. But I tried to restore the model in a new separate file and the restore operation went well. But if I, e.g., run "print(sess_dev.run("W1"))", it just prints out "None". I do not see any documentation either how I can call operations from my saved model. Do you know any source? – edn Mar 27 '18 at 18:28
  • helper one: tf.reset_default_graph() – y.selivonchyk Mar 27 '18 at 21:21
  • @edn helper 2: tf.get_default_graph().get_operation_by_name(name) – y.selivonchyk Mar 27 '18 at 21:22
  • I managed to solve the problem but after endless hours in front of the screen, I just lost the track of my changes. The new (and actually the ground) problem is how I can evaluate the cost and accuracy of my model on the dev set periodically (say after each 100 epoch or sth) while the training is going on. The challenge is that the train data and dev data reside in different csv files. And I am very much lost among tens of crashes that do not give any meaningful insight.. You are more than welcome to share your recommendations. :) – edn Mar 28 '18 at 01:21
  • @edn You can take a look at the question over here to see how to use it with dataset API: https://stackoverflow.com/questions/47356764/how-to-use-tensorflow-dataset-api-with-training-and-validation-sets/47357914#47357914 You can take a look at tensorpack framework on top of tensorflow - it solves many tricky parts, but it is a framework. Other than that, just define your input as a placeholder and whenever you want just run only cost calculations on validation data instead of training on your train data. Apart from that, good luck! First steps are the hardest. – y.selivonchyk Mar 28 '18 at 03:35
  • I will check out the tensorpack framework. I also checked out the other entry. My solution looks though a bit different as I read my input data from csv files. I have a side question related to this but semeingly many people are suffering from a similar problem so I am not sure if I will get an answer. You can have a look if you want. Here is the link: https://stackoverflow.com/questions/49525056/tensorflow-python-reading-2-files – edn Mar 28 '18 at 04:06