9

I have a LSTM model (keras) that receives as input the past 20 values of 6 variables and predicts the future 4 values for 3 of those variables. In other words, I have 6 time series and I'm trying to predict the future values of them using their 20 past values. The basic code is:

past_time_steps = 6
future_time_steps = 4
inputs = Input(shape=(20,past_time_steps))
m = LSTM(hid, return_sequences=True)(inputs)
m = Dropout(0.5)(m)
m = LSTM(hid)(m)
m = Dropout(0.5)(m)
outputA = Dense(future_time_steps, activation='linear', W_constraint=nonneg())(m)
outputB = Dense(future_time_steps, activation='linear', W_constraint=nonneg())(m)
outputC = Dense(future_time_steps, activation='linear', W_constraint=nonneg())(m)
m = Model(inputs=[inputs], outputs=[outputA, outputB, outputC])
m.compile(optimizer='adam', loss='mae')
m.fit(x,[y1,y2, y2])

So, the input is a numpy matrix with shape (500,20,6) where 500 represents the number of samples (e.g. training time series).

Now, I have new data available, so for each time series I have a categorical variable (that can takes 6 values: 0,1,2,3,4,5). How can I add this information to the model? Can I add another layer that uses this variable? Should I pad this variable at the beginning/end of the time series so that I'd have an input matrix with shape (500,21,6)?

Titus Pullo
  • 3,751
  • 15
  • 45
  • 65

2 Answers2

1

One_hot_encode the categorical variable and preprocess it the same way as your other temporal data. Your timesteps are not affected by this new data. What's affected is the number of variables only.

Just_Learning
  • 31
  • 1
  • 5
  • how would you encode if say, a few obs. are through 2 years of observation and a few others 2 weeks ? you encode them across 2 years ... it would be extremely sparse ... would be great if you elaborate your answer – Areza Jan 08 '20 at 19:58
1

This thread might interest you: Adding Features To Time Series Model LSTM.

You have basically 3 possible ways:

Let's take an example with weather data from two different cities: Paris and San Francisco. You want to predict the next temperature based on historical data. But at the same time, you expect the weather to change based on the city. You can either:

  • Combine the auxiliary features with the time series data, at the beginning or at the end (ugly!).
  • Concatenate the auxiliary features with the output of the RNN layer. It's some kind of post-RNN adjustment since the RNN layer won't see this auxiliary info.
  • Or just initialize the RNN states with a learned representation of the condition (e.g. Paris or San Francisco).

I wrote a library to condition on auxiliary inputs. It abstracts all the complexity and has been designed to be as user-friendly as possible:

https://github.com/philipperemy/cond_rnn/

The implementation is in tensorflow (>=1.13.1) and Keras.

Hope it helps!

Philippe Remy
  • 2,967
  • 4
  • 25
  • 39
  • Can you elaborate how your answer related to the categorical variable ? – Areza Jan 08 '20 at 19:54
  • 1
    From my understanding, the important thing here is not so much about categorical variable but how to feed an exogenous variable that does not depend on time to a LSTM model. – Philippe Remy Jan 09 '20 at 01:17
  • so if i was interesting in predicting early onset disease, and I have many non-time series variables such as conditions appearing in a medical record at various points in time, would this implementation allow me to encode these events within the LSTM “as they happen” i.e. the LSTM recognizes that high cholesterol, *closely followed* in their record by high blood pressure and chest pain are indicative of a heart attack? Or does this implementation take their entire medical history, regardless of the order of events and merge with the final layer? – brucezepplin Jun 01 '21 at 14:35
  • @brucezepplin Any "standard" RNN unrolls a time series on the time axis to produce a vector H(n) to be passed on the next layer. The init state H(0) is usually a vector of 0. With CondRNN, any external conditions (that does not depend on time) is used to initialize the state H(0) in a smarter way than a vector of 0. When you train a CondRNN, you pass the time series X as usual as well as a vector of conditions C. – Philippe Remy Mar 17 '23 at 01:28