INFO:tensorflow:Error reported to Coordinator: , 2 root error(s) found

Question

I am trying to run a object detection model using tensorflow objection detection API. My purpose for running object detection is trying to solve captcha problem using object detection. I following the one tutorial for that. System configuration: virtual machine on Azure GPU - nivida tesla k80 RAM - 56 tensorflow version - 1.14 My model is running but it stopped after 16 iterations, till that iteration model is running fine and loss is also reducing but after that it giving error. I am using faster_RCNN_resnet_inception_v2_atrous_coco. I followed every path that are required for execution of model. I am giving the input in the batch of 2, more then it's giving error of resource exhausted.

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, 2 root error(s) found.
  (0) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [1,284,1024,3] vs. shape[1]= [1,296,1024,3]
         [[node concat (defined at /root/workspace/models/research/object_detection/legacy/trainer.py:191) ]]
         [[gradients/FirstStageFeatureExtractor/InceptionResnetV2/InceptionResnetV2/Repeat/block35_3/Conv2d_1x1/BiasAdd_grad/BiasAddGrad/_6083]]
  (1) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [1,284,1024,3] vs. shape[1]= [1,296,1024,3]
         [[node concat (defined at /root/workspace/models/research/object_detection/legacy/trainer.py:191) ]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node concat:
 Preprocessor_1/sub (defined at /root/workspace/models/research/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor.py:77)

Input Source operations connected to node concat:
 Preprocessor_1/sub (defined at /root/workspace/models/research/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor.py:77)

Original stack trace for 'concat':
  File "legacy/train.py", line 185, in <module>
    tf.app.run()
  File "/opt/sft/miniconda3/envs/ravi_gpu/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/opt/sft/miniconda3/envs/ravi_gpu/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/opt/sft/miniconda3/envs/ravi_gpu/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/opt/sft/miniconda3/envs/ravi_gpu/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "legacy/train.py", line 181, in main
    graph_hook_fn=graph_rewriter_fn)
  File "/root/workspace/models/research/object_detection/legacy/trainer.py", line 297, in train
    clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
  File "/root/workspace/models/research/slim/deployment/model_deploy.py", line 194, in create_clones
    outputs = model_fn(*args, **kwargs)
  File "/root/workspace/models/research/object_detection/legacy/trainer.py", line 191, in _create_losses
    images = tf.concat(preprocessed_images, 0)
  File "/opt/sft/miniconda3/envs/ravi_gpu/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/opt/sft/miniconda3/envs/ravi_gpu/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1299, in concat
    return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
  File "/opt/sft/miniconda3/envs/ravi_gpu/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1256, in concat_v2
    "ConcatV2", values=values, axis=axis, name=name)
  File "/opt/sft/miniconda3/envs/ravi_gpu/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/opt/sft/miniconda3/envs/ravi_gpu/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/opt/sft/miniconda3/envs/ravi_gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/opt/sft/miniconda3/envs/ravi_gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

I0120 12:23:47.601684 140495582246720 coordinator.py:224] Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, 2 root error(s) found.
  (0) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [1,284,1024,3] vs. shape[1]= [1,296,1024,3]
         [[node concat (defined at /root/workspace/models/research/object_detection/legacy/trainer.py:191) ]]
         [[gradients/FirstStageFeatureExtractor/InceptionResnetV2/InceptionResnetV2/Repeat/block35_3/Conv2d_1x1/BiasAdd_grad/BiasAddGrad/_6083]]
  (1) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [1,284,1024,3] vs. shape[1]= [1,296,1024,3]
         [[node concat (defined at /root/workspace/models/research/object_detection/legacy/trainer.py:191) ]]
0 successful operations.
0 derived errors ignored.

One image in your dataset has a wrong shape (look at the error message). Your program works fine up until it tries to run that image, and then it fails. Either preprocess your dataset s.t. all images have the same shape, or add a preprocessing step in your pipeline that resizes all images to a common shape. — GPhilo, Jan 20 '20 at 12:42
@Ravi kant Gautam, Can you please confirm if the error is resolved with the comment mentioned above ? Else, can you share reproducible code so that i will try to help you. — , Jun 04 '20 at 06:26
I didn't implemented above comment. I just changed my model faster_RCNN to single shot detection and it removed my error. — Ravi kant Gautam, Jun 05 '20 at 15:46

score 0 · Answer 1 · answered Jun 05 '20 at 15:33

You get this error when the tensors passed to tf.concat are of different dimensions. Below is the code to reproduce the error you are facing.

Code to reproduce the error -

import tensorflow as tf
t1 = tf.constant([[1, 2, 3], [4, 5, 6]])
t2 = tf.constant([[7, 8, 9,10], [10, 11, 12,14]])
tf.concat([t1, t2],0)

Output -

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-78-333c8942fc7b> in <module>()
      2 t1 = tf.constant([[1, 2, 3], [4, 5, 6]])
      3 t2 = tf.constant([[7, 8, 9,10], [10, 11, 12,14]])
----> 4 tf.concat([t1, t2],0)

4 frames
/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)

InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [2,3] vs. shape[1] = [2,4] [Op:ConcatV2] name: concat

Solution - In my case I can create a function to pad zero values to smaller tensor to fit to the bigger tensor shape. In your use case, as you are dealing with images, you can use tf.image.resize to resize the bigger image to smaller image shape and then use for tf.concat.

Fixed Code -

import tensorflow as tf
import numpy as np

t1 = tf.constant([[1, 2, 3], [4, 5, 6]])
t2 = tf.Variable([[7, 8, 9,10], [10, 11, 12,14]])

t3 = np.asarray(t1).tolist()

for i in range(0,t1.shape[0]):
  t3[i].append(0)

tf.concat([t3, t2],0)

Output -

<tf.Tensor: shape=(4, 4), dtype=int32, numpy=
array([[ 1,  2,  3,  0],
       [ 4,  5,  6,  0],
       [ 7,  8,  9, 10],
       [10, 11, 12, 14]], dtype=int32)>

Hope this answers your question. Happy Learning.

@Ravi kant Gautam - Hope we have answered your question. Can you please accept and upvote the answer if you are satisfied with the answer. — , Jun 05 '20 at 18:19
can you help me in this please https://stackoverflow.com/questions/68225332/invalid-argument-in0-mismatch-in1-shape? — user, Jul 03 '21 at 23:46

INFO:tensorflow:Error reported to Coordinator: , 2 root error(s) found

1 Answers1