If the answer is yes, and is this true that arbitrary ops/functions/methods could be backpropagated through? Furthermore,for example, what's the gradient of scalar_y w.r.t. tensor_x in below code:
step1: scalar_y = tf.size(tensor_x)
OR scalar_y = paddle.fluid.layers.size(tensor_x)
step2: tensor_z = scalar_y * tensor_x
If the answer is no, and then under what circumstances or in which (types of) ops will this happens, and why?
One more concrete example(one implementation of DropBlock) : Line 34~37 in https://github.com/DHZS/tf-dropblock/blob/master/nets/dropblock.py
My question is whether it is necessary to wrap all the variables below in tf.stop_gradient:
output = inputs * mask * tf.to_float(tf.size(mask)) / tf.reduce_sum(mask)
Anybody help?