How to combine two LSTM layers with different input sizes in Keras?

Question

I have two types of input sequences where input1 contains 50 values and input2 contains 25 values. I tried to combine these two sequence types using a LSTM model in functional API. However since the length of my two input sequences are different, I am wondering whether what I am currently doing is the right way. My code is as follows:

input1 = Input(shape=(50,1))
x1 = LSTM(100)(input1)
input2 = Input(shape=(25,1))
x2 = LSTM(50)(input2)

x = concatenate([x1,x2])
x = Dense(200)(x)
output = Dense(1, activation='sigmoid')(x)

model = Model(inputs=[input1,input2], outputs=output)

More specifically I want to know how to combine two LSTM layers that have different input lengths (i.e. 50 and 25 in my case). I am happy to provide more details if needed.

Can you clarify what you mean by you want to combine the two lstm layers? The way you are doing it right now you are concatenating the outputs, i.e. putting the outputs of both LSTM layers into the next dense layer. There seems to be nothing wrong with that imo. — johannesack, Mar 14 '20 at 13:31
@JohannesAck Thank you for the comment. What I meant is that the length of my input seuences are differenet (i.e. 50 and 25). Moreover, the number of nodes in my LSTM layers are also different. I want to clarify whether they would have any impact in `concatenate` function. Looking forward to hearing from you :) — EmJ, Mar 14 '20 at 13:37
@EmJ Since you are using the last output (i.e. `return_sequences=False`) of each LSTM layer, the concatenation would work properly: it gets two inputs of shape `(?, 100)` and `(?, 50)` and outputs a tensor of shape `(?, 150)`. In other words, with this settings, the length of two input sequences does not affect the concatenation function. — today, Mar 14 '20 at 14:02

Ronakrit W. · Accepted Answer · 2020-03-17T13:53:06.060

Actually you problem is pretty normal in task like NLP where you have different length of sequence. In your comment you discard all of previous output by using return_sequences=False which is not common in our practice and it normally result in a low performance model.

Note: There is no ultimate solution in neural network architecture design

Here is what I can suggest.

Method 1 (No custom layer required)

You can use same latent dimension in both LSTM and stack them up in 2 dimension and treat them as one big hidden layer tensor.

input1 = Input(shape=(50,1))
x1 = LSTM(100, return_sequences=True)(input1)
input2 = Input(shape=(25,1))
x2 = LSTM(100, return_sequences=True)(input2)
x = concatenate([x1,x2], axis=1)

# output dimension = (None, 75, 100)

If you do not want to have same latent dimension, what others do is adding 1 more part which we normally call it a mapping layer which consisted of stacked of dense layer. This approach have more variable which means model is harder to train.

input1 = Input(shape=(50,1))
x1 = LSTM(100, return_sequences=True)(input1)
input2 = Input(shape=(25,1))
x2 = LSTM(50, return_sequences=True)(input2)

# normally we have more than 1 hidden layer
Map_x1 = Dense(75)(x1)
Map_x2 = Dense(75)(x2)
x = concatenate([Map_x1 ,Map_x2 ], axis=1)

# output dimension = (None, 75, 75)

Or flatten the output (both of them)

input1 = Input(shape=(50,1))
x1 = LSTM(100, return_sequences=True)(input1)
input2 = Input(shape=(25,1))
x2 = LSTM(50, return_sequences=True)(input2)

# normally we have more than 1 hidden layer
flat_x1 = Flatten()(x1)
flat_x2 = Flatten()(x2)
x = concatenate([flat_x1 ,flat_x2 ], axis=1)

# output (None, 2650)

Method 2 (custom layer required)

create your custom layer and use attention mechanism that produce a attention vector and use that attention vector as a representation of your LSTM output tensor. What others do and achieve better performance is to use last hidden state of LSTM (that you only use in your model) with attention vector as a representation.

Note: According to research, different types of attention gives almost the same performance, so I recommend "Scaled Dot-Product Attention" because it is faster to compute.

input1 = Input(shape=(50,1))
x1 = LSTM(100, return_sequences=True)(input1)
input2 = Input(shape=(25,1))
x2 = LSTM(50, return_sequences=True)(input2)

rep_x1 = custom_layer()(x1)
rep_x2 = custom_layer()(x2)
x = concatenate([rep_x1 ,rep_x2], axis=1)

# output (None, (length rep_x1+length rep_x2))

Wow, thanks a lot for the great answer. I am very interested in your idea of custom layer. However, in the code of it `x = concatenate([flat_x1 ,flat_x2 ], axis=1)` should be `x = concatenate([rep_x1 ,rep_x2 ], axis=1)` right? Do we need to write `custom_layer()` by ourselves or do I need to import it? Please let me know your thoughts. Thank you :) — EmJ, Mar 17 '20 at 13:31
Yep you are right and thank for pointing out the mistake, Normally what we do is create our own function, you can import those custom layer (If you find one), there is a higher chance that you have to create it yourself. — Ronakrit W., Mar 17 '20 at 13:55
Thanks a lot. This is just to check whether you have any experience in creating custom layers. If so, is it possible to share a code. The reason for this is that I have never done such thing before and thus, it would be a great help for me to understand what custom layer is. Please let me know your thoughts. Looking forward to hearing from you. Thank you very much. :) — EmJ, Mar 18 '20 at 00:15
I have one further question. Since the output of `model 1` with `return_sequences=True` is 3D, can I use the following code after the `concatenate`? `x = Dense(200)(x) output = Dense(1, activation='sigmoid')(x) model = Model(inputs=[input1,input2], outputs=output)`. My problem is a binary classification problem. If the above code does not suits, please let me know how I can handle after the `concatenate`? Thank you very much :) — EmJ, Mar 18 '20 at 03:16
I'm sorry I couldn't find my code in my computer. you can learn how to write a keras custom layer in this link https://keras.io/layers/writing-your-own-keras-layers/ — Ronakrit W., Mar 18 '20 at 03:48
if you want your output to have only (batch, 2) for binary classification and you don't want to implement custom function, I suggest you can use flatten the output then use dense layer output = Dense(1, activation='sigmoid')(x) will work fine. — Ronakrit W., Mar 18 '20 at 03:50
Yes ,you are right that output is 3D, which may not suite with your task. Now, I think now you know that keras layer is quite limited, most of the time we need to create custom layer when we do research. Good luck!! — Ronakrit W., Mar 18 '20 at 04:00
Hi, I thought that you would have some great ideas to solve my question related to LSTMs: https://stackoverflow.com/questions/60859157/how-to-handle-lstms-with-many-features-in-python Please kindly let me know your thoughts. Thank you :) — EmJ, Mar 26 '20 at 00:02

How to combine two LSTM layers with different input sizes in Keras?

1 Answers1