How to apply masking layer to sequential CNN model in Keras?

Question

I have a problem of applying masking layer to CNNs in RNN/LSTM model.

My data is not original image, but I converted into a shape of (16, 34, 4)(channels_first). The data is sequential, and the longest step length is 22. So for invariant way, I set the timestep as 22. Since it may be shorter than 22 steps, I fill others with np.zeros. However, for 0 padding data, it's about half among all dataset, so with 0 paddings, the training cannot reach a very good result with so much useless data. Then I want to add a mask to cancel these 0 padding data.

Here is my code.

mask = np.zeros((16,34,4), dtype = np.int8)  
input_shape = (22, 16, 34, 4)  
model = Sequential()  
model.add(TimeDistributed(Masking(mask_value=mask), input_shape=input_shape, name = 'mask'))  
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name = 'conv1'))  
model.add(TimeDistributed(BatchNormalization(), name = 'bn1'))  
model.add(Dropout(0.5, name = 'drop1'))  
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name ='conv2'))  
model.add(TimeDistributed(BatchNormalization(), name = 'bn2'))  
model.add(Dropout(0.5, name = 'drop2'))  
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name ='conv3'))  
model.add(TimeDistributed(BatchNormalization(), name = 'bn3'))  
model.add(Dropout(0.5, name = 'drop3'))  
model.add(TimeDistributed(Flatten(), name = 'flatten'))  
model.add(GRU(256, activation='tanh', return_sequences=True, name = 'gru'))  
model.add(Dropout(0.4, name = 'drop_gru'))  
model.add(Dense(35, activation = 'softmax', name = 'softmax'))  
model.compile(optimizer='Adam',loss='categorical_crossentropy',metrics=['acc'])

Here's the model structure.
model.summary():

_________________________________________________________________  
Layer (type)                 Output Shape              Param #     
=================================================================  
mask (TimeDist (None, 22, 16, 34, 4)     0           
_________________________________________________________________  
conv1 (TimeDistributed)      (None, 22, 100, 30, 3)    16100       
_________________________________________________________________  
bn1 (TimeDistributed)        (None, 22, 100, 30, 3)    12          
_________________________________________________________________  
drop1 (Dropout)              (None, 22, 100, 30, 3)    0           
_________________________________________________________________  
conv2 (TimeDistributed)      (None, 22, 100, 26, 2)    100100      
_________________________________________________________________  
bn2 (TimeDistributed)        (None, 22, 100, 26, 2)    8           
_________________________________________________________________  
drop2 (Dropout)              (None, 22, 100, 26, 2)    0           
_________________________________________________________________  
conv3 (TimeDistributed)      (None, 22, 100, 22, 1)    100100      
_________________________________________________________________  
bn3 (TimeDistributed)        (None, 22, 100, 22, 1)    4           
_________________________________________________________________  
drop3 (Dropout)              (None, 22, 100, 22, 1)    0           
_________________________________________________________________  
flatten (TimeDistributed)    (None, 22, 2200)          0           
_________________________________________________________________  
gru (GRU)                    (None, 22, 256)           1886976     
_________________________________________________________________  
drop_gru (Dropout)           (None, 22, 256)           0           
_________________________________________________________________  
softmax (Dense)              (None, 22, 35)            8995        
=================================================================  
Total params: 2,112,295  
Trainable params: 2,112,283  
Non-trainable params: 12  
_________________________________________________________________

For mask_value, I tried with either 0 or this mask structure, but neither works and it still trains through all the data with half 0 paddings in it.
Can anyone help me?

B.T.W., I used TimeDistributed here to connect RNN, and I know another one called ConvLSTM2D. Does anyone know the difference? ConvLSTM2D takes much more params for the model, and get training much slower than TimeDistributed...

lunguini · Answer 1 · 2019-02-05T02:54:07.930

Unfortunately masking is not yet supported by the Keras Conv layers. There have been several issues posted about this on the Keras Github page, here is the one with the most substantial conversation on the topic. It appears that there was some hang up implementation details and the issue was never resolved.

The workaround proposed in the discussion is to have an explicit embedding for the padding character in sequences and do global pooling. Here is another workaround I found (not helpful for my use case but maybe helpful to you) - keeping a mask array to merge through multiplication.

You can also check out the conversation around this question which is similar to yours.

How to apply masking layer to sequential CNN model in Keras?

1 Answers1

Linked