1

I want to create a basic LSTM network that accept sequences of 5 dimensional vectors (for example as a N x 5 arrays) and returns the corresponding sequences of 4 dimensional hidden- and cell-vectors (N x 4 arrays), where N is the number of time steps.

How can I do it TensorFlow?

ADDED

So, far I got the following code working:

num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)

timesteps = 18
num_input = 5
X = tf.placeholder("float", [None, timesteps, num_input])
x = tf.unstack(X, timesteps, 1)
outputs, states = tf.contrib.rnn.static_rnn(lstm, x, dtype=tf.float32)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()

However, there are many open questions:

  1. Why number of time steps is preset? Shouldn't LSTM be able to accept sequences of arbitrary length?
  2. Why do we split data by time-steps (using unstack)?
  3. How to interpret the "outputs" and "states"?
Roman
  • 124,451
  • 167
  • 349
  • 456

1 Answers1

2

Why number of time steps is preset? Shouldn't LSTM be able to accept sequences of arbitrary length?

If you want to accept sequences of arbitrary length, I recommend using dynamic_rnn.You can refer here to understand the difference between them.

For example:

num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)

num_input = 5
X = tf.placeholder("float", [None, None, num_input])
outputs, states = tf.nn.dynamic_rnn(lstm, X, dtype=tf.float32)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})

x_val = np.random.normal(size = (12,16,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()

dynamic_rnn require same length in one batch , but you can specify every length using the sequence_length parameter after you pad batch data when you need arbitrary length in one batch.

We do we split data by time-steps (using unstack)?

Just static_rnn needs to split data with unstack,this depending on their different input requirements. The input shape of static_rnn is [timesteps,batch_size, features], which is a list of 2D tensors of shape [batch_size, features]. But the input shape of dynamic_rnn is either [timesteps,batch_size, features] or [batch_size,timesteps, features] depending on time_major is True or False.

How to interpret the "outputs" and "states"?

The shape of states is [2,batch_size,num_units ] in LSTMCell, one [batch_size, num_units ] represents C and the other [batch_size, num_units ] represents h. You can see pictures below.

enter image description here

enter image description here

In the same way, You will get the shape of states is [batch_size, num_units ] in GRUCell.

outputs represents the output of each time step, so by default(time_major=False) its shape is [batch_size, timesteps, num_units]. And You can easily conclude that state[1, batch_size, : ] == outputs[ batch_size, -1, : ].

giser_yugang
  • 6,058
  • 4
  • 21
  • 44