4

In keras, I want to calculate the mean of nonzero embedding output. I wonder what is the difference between mask_zero=True or False in Embedding Layer. I tried the code below :

input_data = Input(shape=(5,), dtype='int32', name='input')
embedding_layer = Embedding(1000, 24, input_length=5,mask_zero=True,name='embedding')
out = word_embedding_layer(input_data)
def antirectifier(x):
    x = K.mean(x, axis=1, keepdims=True)
    return x
def antirectifier_output_shape(input_shape):
    shape = list(input_shape)
    return tuple(shape)
out = Lambda(antirectifier, output_shape=antirectifier_output_shape,name='lambda')(out)

But it seems that the result is the mean of all the elements, how can i just calculate the mean of all nonzero inputs?

Rajat
  • 647
  • 3
  • 10
  • 30
user9680322
  • 41
  • 1
  • 2

1 Answers1

0

From the function's doc :

If this is True then all subsequent layers in the model need to support masking

Your lambda function doesn't support masking. For example Recurrent layers in Keras support masking. If you set mask_zero=True in your embeddings, then all the 0 indices that you feed to the embedding layer will be propagated as "masked" and the following layers that are able to understand the "masked" information will use them.

Basically, if you build a "mean" layer that grabs the mask and computes the average only for non-masked values, then you will get the desired results.

You can find here a way to build your lambda layers that support masking

I hope it helps.

Nassim Ben
  • 11,473
  • 1
  • 34
  • 52
  • If i use mask_zero=True,which means i must use layers that support mask in the whole net?But i have some dense layers which do not support mask.Then how can i handle this?Can i use lambda and not return mask – user9680322 Apr 23 '18 at 17:43
  • If you think about it, it makes sense : a dense layer can’t handle a mask and propagate it since it trains a weight on every input neuron and do some computation on the sum... it isn’t clear what to do in case of masked inputs. If you use masking, the next layer should be able to handle it, LSTM for example will do the job and compute outputs while skipping masked values... after that lstm you can put your dense layers. – Nassim Ben Apr 23 '18 at 17:48