2

I have constructed LSTM architecture using Keras, but I am not certain if duplicating time steps is a good approach to deal with variable sequence length.

I have a multidimensional data set with multi-feature sequence and varying time steps. It is a multivariate time series data with multiple examples to train LSTM on, and Y is either 0 or 1. Currently, I am duplicating last time steps for each sequence to ensure timesteps = 3.

I appreciate if someone could answer the following questions or concerns: 1. Is creating additional time steps with feature values represented by zeroes more suitable?
2. What is the right way to frame this problem, pad sequences, and mask for evaluation.
3. I am duplicating last time step in Y variable as well for prediction, and the value 1 in Y only appears at the last time step if at all.

# The input sequences are
trainX = np.array([
        [
            # Input features at timestep 1
            [1, 2, 3],
            # Input features at timestep 2
            [5, 2, 3] #<------ duplicate this to ensure compliance
        ],
        # Datapoint 2
        [
            # Features at timestep 1
            [1, 8, 9],
            # Features at timestep 2
            [9, 8, 9],
            # Features at timestep 3
            [7, 6, 1]
        ]
    ])

# The desired model outputs is as follows:
trainY = np.array([
        # Datapoint 1
        [
            # Target class at timestep 1
            [0],
            # Target class at timestep 2
            [1] #<---------- duplicate this to ensure compliance
        ],
        # Datapoint 2
        [
            # Target class at timestep 1
            [0],
            # Target class at timestep 2
            [0]
            # Target class at time step 3
            [0]
        ]
    ])

timesteps = 3
model = Sequential()
model.add(LSTM(3, kernel_initializer ='uniform', return_sequences=True, batch_input_shape=(None, timesteps, trainX.shape[2]), 
               kernel_constraint=maxnorm(3), name='LSTM'))
model.add(Dropout(0.2))
model.add(LSTM(3, return_sequences=True, kernel_constraint=maxnorm(3), name='LSTM-2'))
model.add(Flatten(name='Flatten'))
model.add(Dense(timesteps, activation='sigmoid', name='Dense'))
model.compile(loss="mse", optimizer="sgd", metrics=["mse"])
model.fit(trainX, trainY, epochs=2000, batch_size=2)
predY = model.predict(testX)
Far
  • 121
  • 1
  • 11

1 Answers1

3

In my opinion there are two solutions to your problem. (Duplicating timesteps is None of them):

  1. Use pad_sequence layer in combination with a masking layer. This is the common approach. Now thanks to padding every sample has the same number of timesteps. The good thing on this method, it's very easy to implement. Also, the Masking layer will give you a little performance boost. The downside of this approach: If you train on a GPU, CuDNNLSTM is the layer to go, which is highly optimized for gpu and therefore a lot faster. But it's not working with a masking layer and if your dataset has a high range of timesteps, you're losing perfomance.

  2. Set your timesteps-shape to None and write a keras generator which will group your batches by timesteps.(I think you'll also have to use the functional api) Now you can implement CuDNNLSTM and every sample will be computed with only the relavant timesteps (instead of padded ones), which is much more efficient.

If you're new to keras and perfomance is not so important, go with option 1. If you have a production environment where you often have to train the Network and it's cost relevant, try option 2.

dennis-w
  • 2,166
  • 1
  • 13
  • 23
  • option 1 means that I would have to get rid of Flatten layer since it is not compatible with masking. Moreover, not padding test data returns an error, so does it make a difference if I pad the test data or not? – Far Aug 07 '18 at 21:51
  • The basic idea is, that LSTM Layer can process data varying with different timspes_length . . . . but not in the same batch. I would apply the same preprocessing as in the train dataset but you could also do something like option 2 or validate with a batch size of 1. I wonder why you want to use Flatten in the first place. Normally you would just remove return_sequence in your last LSTM layer. But option 1 also works without Masking, so you will probably be fine even without masking. – dennis-w Aug 08 '18 at 07:11
  • @dennis-ec please how to use padding and masking layer in case of MLPs. – DINA TAKLIT Mar 21 '19 at 17:01
  • 1
    @DINATAKLIT It might make more sense to post your own question explaining more about your problem and what you want to archieve. Right now I don't understand why you want to use padding in a MLP in the first place. – dennis-w Mar 22 '19 at 09:21
  • @dennis-ec I want to use zero padding with mask layer in my MLPs cause i have variable input length. the detail of my question is here : https://stackoverflow.com/questions/55270074/tensor-flow-how-to-use-padding-and-masking-layer-in-case-of-mlps. Let me know if It's clear of it need to provide more details. – DINA TAKLIT Mar 22 '19 at 12:47