tf.train.Saver() doesn't work

Question

I have tried to use the checkpoint to resume my graph but have no luck. After much tries I decided to run the demo code at http://cv-tricks.com/tensorflow-tutorial/save-restore-tensorflow-models-quick-complete-tutorial/ and this doesn't work either.

I'm using the newest tensorflow 1.1.0 with GPU build.

Here's the demo code I have used:

saver.py

import tensorflow as tf
w1 = tf.Variable(tf.random_normal(shape=[2]), name='w1')
w2 = tf.Variable(tf.random_normal(shape=[5]), name='w2')
saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver.save(sess, 'my_test_model')

restore.py

import tensorflow as tf

with tf.Session() as sess:    
    saver = tf.train.import_meta_graph('my_test_model.meta')
    saver.restore(sess,tf.train.latest_checkpoint('./'))
    print(sess.run(w1))

Console output

me@ubuntu:~/Projects/playground$ python3 saver.py
2017-05-11 17:12:16.833443: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-11 17:12:16.833469: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-11 17:12:16.833476: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-05-11 17:12:16.833481: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-11 17:12:16.833486: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-05-11 17:12:17.085355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:06:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
2017-05-11 17:12:17.085421: W tensorflow/stream_executor/cuda/cuda_driver.cc:485] creating context when one is currently active; existing: 0x2a74b80
2017-05-11 17:12:17.299874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 1 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:05:00.0
Total memory: 7.92GiB
Free memory: 7.30GiB
2017-05-11 17:12:17.300703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 1 
2017-05-11 17:12:17.300716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y Y 
2017-05-11 17:12:17.300723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 1:   Y Y 
2017-05-11 17:12:17.300732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:06:00.0)
2017-05-11 17:12:17.300739: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:05:00.0)
me@ubuntu:~/Projects/playground$ python3 restore.py
2017-05-11 17:12:26.689147: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-11 17:12:26.689174: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-11 17:12:26.689182: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-05-11 17:12:26.689188: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-11 17:12:26.689194: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-05-11 17:12:26.950831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:06:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
2017-05-11 17:12:26.950894: W tensorflow/stream_executor/cuda/cuda_driver.cc:485] creating context when one is currently active; existing: 0x25565b0
2017-05-11 17:12:27.159857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 1 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:05:00.0
Total memory: 7.92GiB
Free memory: 7.30GiB
2017-05-11 17:12:27.160708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 1 
2017-05-11 17:12:27.160722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y Y 
2017-05-11 17:12:27.160729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 1:   Y Y 
2017-05-11 17:12:27.160742: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:06:00.0)
2017-05-11 17:12:27.160750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:05:00.0)
Traceback (most recent call last):
  File "restore.py", line 6, in <module>
    print(sess.run(w1))
NameError: name 'w1' is not defined

I believe what I have read means that w1 should be auto-created as a result of restoring. Any thought on this would be welcomed. Thank you!

Did you check this ? http://stackoverflow.com/questions/43905652/loading-metagraph-and-checkpoints-in-tensorflow — Harsha Pokkalla, May 11 '17 at 22:24

score 2 · Answer 1 · answered May 12 '17 at 04:22

The error 'w1' is not defined is because you have split the code into two different files. Now your restore.py cannot find the w1 variable.

To solve this, edit your restore.py as follows

import tensorflow as tf
from saver import w1
with tf.Session() as sess:    
    saver = tf.train.import_meta_graph('my_test_model.meta')
    saver.restore(sess,tf.train.latest_checkpoint('./'))
    print(sess.run(w1))

Make sure saver.py is in the same directory. This imports the w1 variable from the saver.py file.

An interesting approach to reference the whole module. In other way speaking, is it that I have to rebuild the graph for references? — xxbidiao, May 12 '17 at 13:34

score 1 · Accepted Answer · answered May 15 '17 at 03:52

You can see the following example code:

import tensorflow as tf

tf.reset_default_graph()
w1 = tf.Variable(tf.random_normal(shape=[2]), name='w1')
w2 = tf.Variable(tf.random_normal(shape=[5]), name='w2')
saver = tf.train.Saver()
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(w1))
    print(sess.run(w2))
    saver.save(sess, './my_test_model')

tf.reset_default_graph()
# Method 1: redefine computation graph
w1 = tf.Variable(tf.random_normal(shape=[2]), name='w1')
w2 = tf.Variable(tf.random_normal(shape=[5]), name='w2')
saver = tf.train.Saver()
with tf.Session() as sess:    
    saver.restore(sess, tf.train.latest_checkpoint('./'))
    print(sess.run(w1))
    print(sess.run(w2))

tf.reset_default_graph()
# Method 2: import computation graph and get tensor
saver = tf.train.import_meta_graph('my_test_model.meta')
w1 = tf.get_default_graph().get_tensor_by_name('w1:0')
w2 = tf.get_default_graph().get_tensor_by_name('w2:0')
with tf.Session() as sess:    
    saver.restore(sess, tf.train.latest_checkpoint('./'))
    print(sess.run(w1))
    print(sess.run(w2))

Again, is it that I have to define the name for each tensor? I'm working on a legacy piece of code and would like to minimize rewriting overhead. — xxbidiao, May 15 '17 at 14:10

tf.train.Saver() doesn't work

saver.py

restore.py

Console output

2 Answers2