How to build a multi-layer LSTM in Tensorflow for a video sequence?

Question

I want to build a 3 layer LSTM in tensorflow for video analysis. I read some examples online, but still confusing. Could anyone help to write a concise code snippet to do a task as below:

Input: 5 consecutive video frames in 240X320 dimension

Output: 5 scalars

Thank you so much.

Can you please post the example or your effort? This will help us in better understanding your problem. — Beta, Feb 26 '18 at 05:16

score 1 · Answer 1 · answered Jun 21 '18 at 20:18

Basically you have to prepare you frames for the sequence. You should have a vector like (Batch_size, sequence_length = 5, features = 240*320). Then create your 3 Stacked LSTM using:

layer1 = rnn.BasicLSTMCell(number_lstm_units)
layer2 = rnn.BasicLSTMCell(number_lstm_units)
layer3 = rnn.BasicLSTMCell(number_lstm_units)

Group the cells and pass it to a Multi RNN Cell:

cells = [layer1, layer2, layer3]
multirnn = rnn.MultiRNNCell(cells)

Then with your flattened vector of features you only have to pass each element though the LSTM

for feature in your_flattened_vector:
    lstm_output, state = cell(feature,state)

You will have an output of the same size as your input.

For additional info check the API here.

Hope it helped.

How to build a multi-layer LSTM in Tensorflow for a video sequence?

1 Answers1