1

i'm training tf slim with

https://github.com/tensorflow/models/tree/master/slim

training a model form scratch. some kind of error occured

i think its kind of gpu and cpu running problem.

other codes works fine at me.

but this occuring error

i run following code

python train_image_classifier.py 
    --train_dir= /home/sk/workspace/slim/datasets/log
    --dataset_name=imagenet 
    --dataset_split_name=train 
    --dataset_dir=/home/sk/workspace/slim/datasets/imagenet 
    --model_name=inception_v3

and error is

Caused by op u'InceptionV3/Logits/Conv2d_1c_1x1/biases/RMSProp_1', defined at:
  File "/home/sk/workspace/slim/train_image_classifier.py", line 573, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/sk/workspace/slim/train_image_classifier.py", line 539, in main
    global_step=global_step)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 446, in apply_gradients
    self._create_slots([_get_variable_for(v) for v in var_list])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/rmsprop.py", line 103, in _create_slots
    self._zeros_slot(v, "momentum", self._name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 766, in _zeros_slot
    named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 174, in create_zeros_slot
    colocate_with_primary=colocate_with_primary)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 146, in create_slot_with_initializer
    dtype)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 66, in _create_slot_var
    validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
    validate_shape=validate_shape, use_resource=use_resource)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
    use_resource=use_resource)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 714, in _get_single_variable
    validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 197, in __init__
    expected_shape=expected_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 281, in _init_from_args
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/state_ops.py", line 128, in variable_op_v2
    shared_name=shared_name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 708, in _variable_v2
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Cannot assign a device to node 'InceptionV3/Logits/Conv2d_1c_1x1/biases/RMSProp_1': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
Colocation Debug Info:
Colocation group had the following types and devices: 
ApplyRMSProp: CPU 
Const: CPU 
Assign: CPU 
IsVariableInitialized: CPU 
Identity: CPU 
VariableV2: CPU 
     [[Node: InceptionV3/Logits/Conv2d_1c_1x1/biases/RMSProp_1 = VariableV2[_class=["loc:@InceptionV3/Logits/Conv2d_1c_1x1/biases"], container="", dtype=DT_FLOAT, shape=[3], shared_name="", _device="/device:GPU:0"]()]]


Process finished with exit code 1
Suk Kyu Sun
  • 53
  • 1
  • 5

1 Answers1

0

It's trying to run some ops on the GPU, but TensorFlow doesn't see a GPU device (either because you're using the CPU version of TensorFlow, because of a CUDA installation issue, or because there is no GPU). It looks like you can specify --clone_on_cpu=True to use the CPU instead.

Allen Lavoie
  • 5,778
  • 1
  • 17
  • 26
  • i have gpu(titanx pascal 12GB) and --clone_on_cpu=False (i checked it) i don't know what is problem.. i reinstalled tensorflow and same error "INFO:tensorflow:Error reported to Coordinator: , Cannot assign a device for operation 'InceptionV3/AuxLogits/Conv2d_2b_1x1/biases/RMSProp_1': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device. – Suk Kyu Sun Jun 30 '17 at 01:30
  • Ah, so the problem is just that the GPU hasn't been found. Have you followed the GPU instructions on https://www.tensorflow.org/install/install_linux? If so, please include the output of `nvidia-smi` in your question, along with your CUDA version. – Allen Lavoie Jun 30 '17 at 01:43
  • NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN X (Pascal) Off | 0000:01:00.0 On | N/A | | 23% 37C P8 16W / 250W | 300MiB / 12181MiB | 10% Default | another code works fine – Suk Kyu Sun Jun 30 '17 at 02:13
  • nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Tue_Jan_10_13:22:03_CST_2017 Cuda compilation tools, release 8.0, V8.0.61 and following is cuda version – Suk Kyu Sun Jun 30 '17 at 02:15
  • In that case it's pretty clear that you don't have the GPU version of TensorFlow. Please download one of the "with GPU support" pip packages from the install page. – Allen Lavoie Jun 30 '17 at 16:22