1

I am having trouble around certain aspects of the Keras implementation of LSTM. This is a description of my problem:

I am trying to train a model for word correctness prediction. My model has two types of inputs:

  1. A word sequence (sentence)
  2. And a sequence of features vector (for each word I compute a features victor of 6).

e.g.

input_1 = ['we', 'have', 'two', 'review'] 
input_2 = [
           [1.25, 0.01, 0.000787, 5.235, 0.0, 0.002091], 
           [ 0.0787, 0.02342, 5.4595, 0.002091, 0.003477, 0.0], 
           [0.371533, 0.529893, 0.371533, 0.6, 0.0194156, 0.003297],
           [0.471533, 0.635, 0.458, 0.7, 0.0194156, 0.0287]
          ] 

 gives output = [1, 1, 2, 1]

As each sentence in my training set has different length, I should zero-pad all of my sentences such that they all have the same length.

My question is how about the second input, should I do padding! and how? as they are vectors.

Model Architecture :

input1 = Input(shape=(seq_length,), dtype='int32')
emb = Embedding(input_dim=num_words, output_dim = num_dimension, 
input_length=seq_length, weights=[embeddings], mask_zero=True,trainable=False)(input_layer)

input2 = Input(shape=(seq_length,6 ))
x = keras.layers.concatenate([emb, input2],axis=2)

lstm = LSTM(64, return_sequences=True)(x)
ackwards = LSTM(128, return_sequences=True, go_backwards=True)(x)

common = merge([forwards, backwards], mode='concat', concat_axis=-1)
out = TimeDistributed(Dense(no_targets, activation='softmax'))(lstm)
Areza
  • 5,623
  • 7
  • 48
  • 79

1 Answers1

1

You are on the right track and yes you would need to pad your second input with zero rows to match the sentence lengths. Essentially it would look like this:

# Input 1
X1 = [[12, 34, 3], [6, 7, 0]] # where numbers are word indices and 0 is padding
# Input 2
X2 = [[[1.23,...,2.4], [1.24, ...], [0.6, ...]], [[3.25, ...], [2.4, ...], [0,0,0,0,0]]]
# So the padded words get zero feature vectors as well and the shapes match

But fear not, because you concatenate emb with input2 the mask_zero=True also gets propagated to the concatenated vector so the LSTM actually ignores the padding from second input as well.

nuric
  • 11,027
  • 3
  • 27
  • 42
  • Thank you, I already tried this even if I was not sure. I get an accuracy of 75% after 1 epoch. More training makes the validation accuracy start declining as the training accuracy starts climbing - a clear sign of overfitting. Any ideas ? – user3487059 Nov 02 '18 at 18:35
  • Well, yes, you can add `dropout=0.2` to the LSTM to reduce overfitting or even dropout to the embedding. Another option is to just reduce the number of units of each to reduce capacity of the model. – nuric Nov 02 '18 at 22:08