There was a problem running the test text 'local_test.sh' in deeplab.But modle_test.py works fine

Question

I copied deeplab's source code on github and configured all the files as required.And it works fine for model_test.py.But when I tried to run the local_tset.sh test file, a series of problems occurred.

I can't read the error message, so I don't know what went wrong, and I don't know where to start

2019-08-23 10:39:16.486931: W tensorflow/core/common_runtime/bfc_allocator.cc:319] *************************************************____********___****____************************xxxx
2019-08-23 10:39:16.487253: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at depthwise_conv_op.cc:365 : Resource exhausted: OOM when allocating tensor with shape[4,128,257,257] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "E:\anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
    return fn(*args)
  File "E:\anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "E:\anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[4,128,257,257] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node xception_65/entry_flow/block1/unit_1/xception_module/separable_conv2_depthwise/depthwise}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[gradients/AddN_56/_12764]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[4,128,257,257] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node xception_65/entry_flow/block1/unit_1/xception_module/separable_conv2_depthwise/depthwise}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:/models-master/research/deeplab/train.py", line 517, in <module>
    tf.app.run()
  ...
  ...
  File "E:\anaconda\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[4,128,257,257] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[node xception_65/entry_flow/block1/unit_1/xception_module/separable_conv2_depthwise/depthwise (defined at \models-master\research\deeplab\core\xception.py:175) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[gradients/AddN_56/_12764]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[4,128,257,257] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[node xception_65/entry_flow/block1/unit_1/xception_module/separable_conv2_depthwise/depthwise (defined at \models-master\research\deeplab\core\xception.py:175) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node xception_65/entry_flow/block1/unit_1/xception_module/separable_conv2_depthwise/depthwise:
 xception_65/entry_flow/block1/unit_1/xception_module/Relu_1 (defined at \models-master\research\deeplab\core\xception.py:274)

Input Source operations connected to node xception_65/entry_flow/block1/unit_1/xception_module/separable_conv2_depthwise/depthwise:
 xception_65/entry_flow/block1/unit_1/xception_module/Relu_1 (defined at \models-master\research\deeplab\core\xception.py:274)

Original stack trace for 'xception_65/entry_flow/block1/unit_1/xception_module/separable_conv2_depthwise/depthwise':
  File "/models-master/research/deeplab/train.py", line 517, in <module>
    tf.app.run()
  ...
  ...
  File "\anaconda\lib\site-packages\tensorflow\python\framework\ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

score 0 · Answer 1 · answered Dec 03 '19 at 00:58

0

You used higher batch size or image size than what can be computed on your machine which resulted in "Resource exhausted: OOM when allocating tensor". Try to run the model with a smaller batch size and image size.

answered Dec 03 '19 at 00:58

Manas

888
10
20

There was a problem running the test text 'local_test.sh' in deeplab.But modle_test.py works fine

1 Answers1