How to deal with the non differentiability of argmax

Question

I have a neural network that has the final layer as a flattened layer that provides an output as a curve. The curve might have one or more local maxima, but I am only interested in finding the global maxima. The ground truth I have is the integer index (or argument) of the global maxima. I tried to write a custom loss in Keras like this:

def custom_loss(y_output,y_idx_pred):
     return K.mean(K.sum((K.argmax(y_output)-y_idx_pred)**2))

I also cast the integers to float32. But I am getting error that there is 'None' gradient. I searched for an answer and found that argmax doesn't have any defined gradient. The suggestions I found was to either create a custom Argmax layer or to use softmax instead.

How do I even use softmax here? Softmax only provides me with the approximation of [0 0 1... ], not the integer index itself. How am I supposed to work with this? I even tried to treat the problem as a classification problem by turning the ground truth to [0 1 0... ] and used crossentropy, but the network could not learn anything. It did better when I just added a dense(1) layer and then trained the model. It seems like classification treats all the arguments equally, but that is not the case here. I need the euclidean l2 distance.

Where can I get a proper instruction of creating the custom Argmax layer? Will it even help in my situation? Is it possible to implement a custom loss function that is differentiable? What should I do?

Take a look https://stackoverflow.com/questions/46926809/getting-around-tf-argmax-which-is-not-differentiable — Sharky, Mar 26 '19 at 17:46

How to deal with the non differentiability of argmax

0 Answers0