I'm trying to use a custom square root activation function for my Keras sequential model (specifically for the MNIST dataset). When I use tf.math.sqrt(x)
, training goes smoothly and the model is quite accurate. However, when I try using tf.math.pow(x, 0.5)
, the model fails to train and the losses go to NaN.
I'm really unsure why this is happening because I would think that the two alternatives are identical.
Square root function
def tfsqrt(x):
cond = tf.greater_equal(x, 0)
return tf.where(cond, tf.math.sqrt(x), -tf.math.sqrt(-x))
Power function
def pwsqrt(x):
cond = tf.greater_equal(x, 0)
return tf.where(cond, tf.math.pow(x, 0.5), -tf.math.pow(-x, 0.5))
If anybody could explain this unexpected behavior, that would be much appreciated. Thanks!