5

I have this error when I try to train an image detector with my own images on google colab with TPU :

From /job:worker/replica:0/task:0: Compilation failure: Asked to propagate a dynamic dimension from hlo %convert.283 = f32[1,80,80,32]{3,2,1,0} convert(f32[1,80,80,32]{3,2,1,0} %add.1), metadata={op_type="FusedBatchNorm" op_name="bn_Conv1_3/FusedBatchNorm"}@{}@0 to hlo %clamp.288 = f32[1,80,80,32]{3,2,1,0} clamp(f32[1,80,80,32]{3,2,1,0} %broadcast.286, f32[1,80,80,32]{3,2,1,0} %convert.283, f32[1,80,80,32]{3,2,1,0} %broadcast.287), metadata={op_type="Relu6" op_name="Conv1_relu_3/Relu6"}, which is not implemented. TPU compilation failed [[node TPUReplicateMetadata_1 (defined at :24) ]]

Here's the link to the code :

https://drive.google.com/open?id=1mPiod1At85RgNwHx4vYFxH38Ck16Ep1m

Do you have any idea of what is going on ?

It must not be a problem of size of the pictures or any problem of batch size, I already looked at it.

Thanks.

arun v
  • 852
  • 7
  • 19
aa bb
  • 411
  • 1
  • 5
  • 17

1 Answers1

1

I think the problem comes from your labels. Please try following code:

y_train = tf.keras.utils.to_categorical(labels, NUM_CLASSES)
y_test = tf.keras.utils.to_categorical(labelstest, NUM_CLASSES)
zeros = tf.zeros([NUM_CLASSES], tf.int32)
y_train  = tf.math.add(y_train,zeros)
y_test = tf.math.add(y_train,zeros)
  • I got this error (the details slighly different). And for my case, don't think it is the model or shape of label. I used tf.image and others to perform data aug as part of the tf.data.Dataset pipeline and it goes away if i turn data aug off, need to isolate where the problem is. If you have a hint, pls comment. – kawingkelvin May 29 '21 at 16:12