21

I'm currently trying to build a simple model for predicting time series. The goal would be to train the model with a sequence so that the model is able to predict future values.

I'm using tensorflow and lstm cells to do so. The model is trained with truncated backpropagation through time. My question is how to structure the data for training.

For example let's assume we want to learn the given sequence:

[1,2,3,4,5,6,7,8,9,10,11,...]

And we unroll the network for num_steps=4.

Option 1

input data               label     
1,2,3,4                  2,3,4,5
5,6,7,8                  6,7,8,9
9,10,11,12               10,11,12,13
...

Option 2

input data               label     
1,2,3,4                  2,3,4,5
2,3,4,5                  3,4,5,6
3,4,5,6                  4,5,6,7
...

Option 3

input data               label     
1,2,3,4                  5
2,3,4,5                  6
3,4,5,6                  7
...

Option 4

input data               label     
1,2,3,4                  5
5,6,7,8                  9
9,10,11,12               13
...

Any help would be appreciated.

Jakob
  • 369
  • 1
  • 3
  • 11
  • among options listed, it seems to me that option 3 would be the most reasonable option if you may indeed assume that 4 past values are enough, to a good degree of approximation, to predict the present value (so it is more about the data, than about a particular method you use for prediction).. – John Donn Mar 12 '16 at 18:10
  • Of course i use more than the past 4 values, this is just a small example for easier demonstration. Also feel free to suggest another option than the 4 presented. – Jakob Mar 14 '16 at 09:09

3 Answers3

5

I'm just about to learn LSTMs in TensorFlow and try to implement an example which (luckily) tries to predict some time-series / number-series genereated by a simple math-fuction.

But I'm using a different way to structure the data for training, motivated by Unsupervised Learning of Video Representations using LSTMs:

LSTM Future Predictor Model

Option 5:

input data               label     
1,2,3,4                  5,6,7,8
2,3,4,5                  6,7,8,9
3,4,5,6                  7,8,9,10
...

Beside this paper, I (tried) to take inspiration by the given TensorFlow RNN examples. My current complete solution looks like this:

import math
import random
import numpy as np
import tensorflow as tf

LSTM_SIZE = 64
LSTM_LAYERS = 2
BATCH_SIZE = 16
NUM_T_STEPS = 4
MAX_STEPS = 1000
LAMBDA_REG = 5e-4


def ground_truth_func(i, j, t):
    return i * math.pow(t, 2) + j


def get_batch(batch_size):
    seq = np.zeros([batch_size, NUM_T_STEPS, 1], dtype=np.float32)
    tgt = np.zeros([batch_size, NUM_T_STEPS], dtype=np.float32)

    for b in xrange(batch_size):
        i = float(random.randint(-25, 25))
        j = float(random.randint(-100, 100))
        for t in xrange(NUM_T_STEPS):
            value = ground_truth_func(i, j, t)
            seq[b, t, 0] = value

        for t in xrange(NUM_T_STEPS):
            tgt[b, t] = ground_truth_func(i, j, t + NUM_T_STEPS)
    return seq, tgt


# Placeholder for the inputs in a given iteration
sequence = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS, 1])
target = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS])

fc1_weight = tf.get_variable('w1', [LSTM_SIZE, 1], initializer=tf.random_normal_initializer(mean=0.0, stddev=1.0))
fc1_bias = tf.get_variable('b1', [1], initializer=tf.constant_initializer(0.1))

# ENCODER
with tf.variable_scope('ENC_LSTM'):
    lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
    multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
    initial_state = multi_lstm.zero_state(BATCH_SIZE, tf.float32)
    state = initial_state
    for t_step in xrange(NUM_T_STEPS):
        if t_step > 0:
            tf.get_variable_scope().reuse_variables()

        # state value is updated after processing each batch of sequences
        output, state = multi_lstm(sequence[:, t_step, :], state)

learned_representation = state

# DECODER
with tf.variable_scope('DEC_LSTM'):
    lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
    multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
    state = learned_representation
    logits_stacked = None
    loss = 0.0
    for t_step in xrange(NUM_T_STEPS):
        if t_step > 0:
            tf.get_variable_scope().reuse_variables()

        # state value is updated after processing each batch of sequences
        output, state = multi_lstm(sequence[:, t_step, :], state)
        # output can be used to make next number prediction
        logits = tf.matmul(output, fc1_weight) + fc1_bias

        if logits_stacked is None:
            logits_stacked = logits
        else:
            logits_stacked = tf.concat(1, [logits_stacked, logits])

        loss += tf.reduce_sum(tf.square(logits - target[:, t_step])) / BATCH_SIZE

reg_loss = loss + LAMBDA_REG * (tf.nn.l2_loss(fc1_weight) + tf.nn.l2_loss(fc1_bias))

train = tf.train.AdamOptimizer().minimize(reg_loss)

with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())

    total_loss = 0.0
    for step in xrange(MAX_STEPS):
        seq_batch, target_batch = get_batch(BATCH_SIZE)

        feed = {sequence: seq_batch, target: target_batch}
        _, current_loss = sess.run([train, reg_loss], feed)
        if step % 10 == 0:
            print("@{}: {}".format(step, current_loss))
        total_loss += current_loss

    print('Total loss:', total_loss)

    print('### SIMPLE EVAL: ###')
    seq_batch, target_batch = get_batch(BATCH_SIZE)
    feed = {sequence: seq_batch, target: target_batch}
    prediction = sess.run([logits_stacked], feed)
    for b in xrange(BATCH_SIZE):
        print("{} -> {})".format(str(seq_batch[b, :, 0]), target_batch[b, :]))
        print(" `-> Prediction: {}".format(prediction[0][b]))

Sample output of this looks like this:

### SIMPLE EVAL: ###
# [input seq] -> [target prediction]
#  `-> Prediction: [model prediction]  
[  33.   53.  113.  213.] -> [  353.   533.   753.  1013.])
 `-> Prediction: [ 19.74548721  28.3149128   33.11489105  35.06603241]
[ -17.  -32.  -77. -152.] -> [-257. -392. -557. -752.])
 `-> Prediction: [-16.38951683 -24.3657589  -29.49801064 -31.58583832]
[ -7.  -4.   5.  20.] -> [  41.   68.  101.  140.])
 `-> Prediction: [ 14.14126873  22.74848557  31.29668617  36.73633194]
...

The model is a LSTM-autoencoder having 2 layers each.

Unfortunately, as you can see in the results, this model does not learn the sequence properly. I might be the case that I'm just doing a bad mistake somewhere, or that 1000-10000 training steps is just way to few for a LSTM. As I said, I'm also just starting to understand/use LSTMs properly. But hopefully this can give you some inspiration regarding the implementation.

b3nk4n
  • 1,101
  • 1
  • 11
  • 17
  • 1
    I'm using currently Option 2 with some success. What make me question your approach is, that the model does not "see" the data in order. As far as I understood, the internal state of the network is influenced by all values the model "saw" so far. Therefore, if you start a new sequence you have to reset the internal state. In the form you feed the data, the model sees a lot of repetition in the data. But I could be wrong, I'm not sure yet. – Jakob May 31 '16 at 07:15
  • Thank you for that hint. I never thought about resetting the state for each new sequence to learn. I will check this out later this day. Additionally, I have seen that I did a mistake in the Decoder-LSTM: Here, I accidently use the same input sequence as in the Encoder-LSTM, which is wrong. What I wanted to do here is to use the output of the last LSTM-Cell (t-1) as the input of the current cell (t). – b3nk4n May 31 '16 at 07:30
  • I just check it. In the code posted above, the initial-state is a zero-tensor in every iteration. So, it should be fine. Nevertheless, I do not know why it is still learning nothing useful... – b3nk4n May 31 '16 at 07:59
  • @bsautermeister, did you ever get anywhere with thes? I'm looking into doing pretty much the same thing but there's so much content out there that I have gotten lost. – GLaDER Feb 20 '18 at 14:11
  • @GLaDER yes, I did! I used such a encoder-decoder architecture within my Masters Thesis project for video frame prediction: http://bsautermeister.de/research/frame-prediction/ There, you can also find a link to the source code. – b3nk4n Apr 06 '18 at 09:48
4

After reading several LSTM introduction blogs e.g. Jakob Aungiers', option 3 seems to be the right one for stateless LSTM.

If your LSTMs need to remember data longer ago than your num_steps, your can train in a stateful way - for a Keras example see Philippe Remy's blog post "Stateful LSTM in Keras". Philippe does not show an example for batch size greater than one, however. I guess that in your case a batch size of four with stateful LSTM could be used with the following data (written as input -> label):

batch #0:
1,2,3,4 -> 5
2,3,4,5 -> 6
3,4,5,6 -> 7
4,5,6,7 -> 8

batch #1:
5,6,7,8 -> 9
6,7,8,9 -> 10
7,8,9,10 -> 11
8,9,10,11 -> 12

batch #2:
9,10,11,12 -> 13
...

By this, the state of e.g. the 2nd sample in batch #0 is correctly reused to continue training with the 2nd sample of batch #1.

This is somehow similar to your option 4, however you are not using all available labels there.

Update:

In extension to my suggestion where batch_size equals the num_steps, Alexis Huet gives an answer for the case of batch_size being a divisor of num_steps, which can be used for larger num_steps. He describes it nicely on his blog.

Robert Pollak
  • 3,751
  • 4
  • 30
  • 54
  • The answer https://stackoverflow.com/a/48588730/1389680 supports my suggestion about stateful training with multi-sample batches. – Robert Pollak Apr 13 '18 at 12:12
1

I believe Option 1 is closest to the reference implementation in /tensorflow/models/rnn/ptb/reader.py

def ptb_iterator(raw_data, batch_size, num_steps):
  """Iterate on the raw PTB data.

  This generates batch_size pointers into the raw PTB data, and allows
  minibatch iteration along these pointers.

  Args:
    raw_data: one of the raw data outputs from ptb_raw_data.
    batch_size: int, the batch size.
    num_steps: int, the number of unrolls.

  Yields:
    Pairs of the batched data, each a matrix of shape [batch_size, num_steps].
    The second element of the tuple is the same data time-shifted to the
    right by one.

  Raises:
    ValueError: if batch_size or num_steps are too high.
  """
  raw_data = np.array(raw_data, dtype=np.int32)

  data_len = len(raw_data)
  batch_len = data_len // batch_size
  data = np.zeros([batch_size, batch_len], dtype=np.int32)
  for i in range(batch_size):
    data[i] = raw_data[batch_len * i:batch_len * (i + 1)]

  epoch_size = (batch_len - 1) // num_steps

  if epoch_size == 0:
    raise ValueError("epoch_size == 0, decrease batch_size or num_steps")

  for i in range(epoch_size):
    x = data[:, i*num_steps:(i+1)*num_steps]
    y = data[:, i*num_steps+1:(i+1)*num_steps+1]
    yield (x, y)

However, another Option is to select a pointer into your data array randomly for each training sequence.

j314erre
  • 2,737
  • 2
  • 19
  • 26