1

I am using Keras.Backend.armax() in a gamma layer. The model compiles fine but throws an error during fit().

ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

My model:

latent_dim = 512
encoder_inputs = Input(shape=(train_data.shape[1],))
encoder_dense = Dense(vocabulary, activation='softmax')
encoder_outputs = Embedding(vocabulary, latent_dim)(encoder_inputs)
encoder_outputs = LSTM(latent_dim, return_sequences=True)(encoder_outputs)
encoder_outputs = Dropout(0.5)(encoder_outputs)
encoder_outputs = encoder_dense(encoder_outputs)
encoder_outputs = Lambda(K.argmax, arguments={'axis':-1})(encoder_outputs)
encoder_outputs = Lambda(K.cast, arguments={'dtype':'float32'})(encoder_outputs)

encoder_dense1 = Dense(train_label.shape[1], activation='softmax')
decoder_embedding = Embedding(vocabulary, latent_dim)
decoder_lstm1 = LSTM(latent_dim, return_sequences=True)
decoder_lstm2 = LSTM(latent_dim, return_sequences=True)
decoder_dense2 = Dense(vocabulary, activation='softmax')

decoder_outputs = encoder_dense1(encoder_outputs)
decoder_outputs = decoder_embedding(decoder_outputs)
decoder_outputs = decoder_lstm1(decoder_outputs)
decoder_outputs = decoder_lstm2(decoder_outputs)
decoder_outputs = Dropout(0.5)(decoder_outputs)
decoder_outputs = decoder_dense2(decoder_outputs)
model = Model(encoder_inputs, decoder_outputs)
model.summary()

Model summary for easy visualizing:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_7 (InputLayer)         (None, 32)                0         
_________________________________________________________________
embedding_13 (Embedding)     (None, 32, 512)           2018816   
_________________________________________________________________
lstm_19 (LSTM)               (None, 32, 512)           2099200   
_________________________________________________________________
dropout_10 (Dropout)         (None, 32, 512)           0         
_________________________________________________________________
dense_19 (Dense)             (None, 32, 3943)          2022759   
_________________________________________________________________
lambda_5 (Lambda)            (None, 32)                0         
_________________________________________________________________
lambda_6 (Lambda)            (None, 32)                0         
_________________________________________________________________
dense_20 (Dense)             (None, 501)               16533     
_________________________________________________________________
embedding_14 (Embedding)     (None, 501, 512)          2018816   
_________________________________________________________________
lstm_20 (LSTM)               (None, 501, 512)          2099200   
_________________________________________________________________
lstm_21 (LSTM)               (None, 501, 512)          2099200   
_________________________________________________________________
dropout_11 (Dropout)         (None, 501, 512)          0         
_________________________________________________________________
dense_21 (Dense)             (None, 501, 3943)         2022759   
=================================================================
Total params: 14,397,283
Trainable params: 14,397,283
Non-trainable params: 0
_________________________________________________________________

I googled for the solution but almost all were about a faulty model. Some recommended to not use functions causing that are causing issues. However, as you can see, I cannot create this model without K.argmax (If you know any other way then do tell me). How do I solve this issue and hence train my model?

Lcukerd
  • 438
  • 6
  • 21
  • You have a huge conceptual problem, argmax has no gradient, it is not differentiable, so you cannot use it for your model. – Dr. Snoopy Sep 16 '18 at 22:35
  • Yes I know argmax has no gradient and was hoping for a way to define it with something (like 0) to fix the error. I need a function like argmax in order for my model to work, do you know any other function that I can use? – Lcukerd Sep 16 '18 at 22:41
  • Again another conceptual problem, you can't define the gradient of argmax, if you do, it will be always wrong, and then the model won't train because the information in the gradient will be completely wrong. – Dr. Snoopy Sep 16 '18 at 22:44
  • So you are saying that there is no work around this error? Hence, I must use different model (as this will clearly not work without argmax)? – Lcukerd Sep 16 '18 at 22:48
  • Yes, this won't work. Don't use operations that have no gradient. – Dr. Snoopy Sep 16 '18 at 22:58

1 Answers1

-1

For obvious reasons there is no gradient for the Argmax function; How would that even be defined? In order for your model to work, you need to make the layer non-trainable. As per this question (or the documentation), you need to pass trainable = False to your layer. As for the layer weight (if applicable), you probably want to set it to an identity matrix.

Huang_d
  • 144
  • 8
  • Making trainable = False does not help. I still get the same error. I think setting trainable = False on a layer that has no params has no effect. – Lcukerd Sep 16 '18 at 21:08