15

I need to feed variable length sequences into my model.

My model is Embedding + LSTM + Conv1d + Maxpooling + softmax.

When I set mask_zero = True in Embedding, I fail to compile at Conv1d.

How can I input mask value in Conv1d or is there another solution?

Engineero
  • 12,340
  • 5
  • 53
  • 75
Jaspn Wjbian
  • 279
  • 3
  • 7
  • 1
    does it compile if the LSTM is after the convolutional and maxpooling layers? – convolutionBoy Apr 13 '17 at 16:02
  • 1
    @convolutionBoy it still failed if the LSTM after Conv, I have found the issue in github, except RNN and Timedistributed Layer, the else all can't support mask – Jaspn Wjbian Apr 14 '17 at 01:50

2 Answers2

8

The Masking layer expects every downstream layer to support masking, which is not the case of the Conv1D layer. Fortunately, there is another way to apply masking, using the Functional API:

inputs = Input(...)
mask = Masking().compute_mask(inputs) # <= Compute the mask
embed = Embedding(...)(inputs)
lstm = LSTM(...)(embed, mask=mask) # <= Apply the mask
conv = Conv1D(...)(lstm)
...
model = Model(inputs=[inputs], outputs=[...])
MiniQuark
  • 46,633
  • 36
  • 147
  • 183
  • If I understand correctly, this solution works if the data type is integer (for example if the input are words or characters to be embedded). Is there a way to do this with float data (for example if the input is some data series of some physical data?), where embedding is not appropriate? – Itamar Mushkin Jan 08 '20 at 12:26
  • If you tried and it doesn't work with floats, then a last resort option is to write your own custom layers: class YourCustomLayer(keras.layers.Layer): ... def call(self, inputs, mask): ... – MiniQuark Apr 01 '20 at 21:11
7

Conv1D layer does not support masking at this time. Here is an open issue on the keras repo.

Depending on the task you might be able to get away with embedding the mask_value just like the other values in the sequence and apply global pooling (as you're doing now).

Engineero
  • 12,340
  • 5
  • 53
  • 75
parsethis
  • 7,998
  • 3
  • 29
  • 31
  • Thanks for answer. In your solution, I remove mask_value = True,and give padding a embedding vector, when after maxpooling, the padding timestep will gradually be neglected during training. Did you mean that? Or the solution in the github issue, I insert a tensor [1,1,1,0,0,0] which 0 represent pad, I multyply this tensor with conv output then input to the maxpooling layer? How do you think this solution – Jaspn Wjbian Apr 14 '17 at 02:01
  • It's not clear to me how it works. Could you please provide code examples? Thanks. – user4918159 Aug 23 '20 at 00:17