18

I'm trying to use LSTM to do store sales forecast. Here is how my raw data look like:

|     Date   | StoreID | Sales | Temperature |  Open   | StoreType |
|------------|---------|-------|-------------|---------|-----------|
| 01/01/2016 |   1     |   0   |      36     |    0    |     1     |
| 01/02/2016 |   1     | 10100 |      42     |    1    |     1     |
| ...
| 12/31/2016 |   1     | 14300 |      39     |    1    |     1     |
| 01/01/2016 |   2     | 25000 |      46     |    1    |     3     |
| 01/02/2016 |   2     | 23700 |      43     |    1    |     3     |
| ...
| 12/31/2016 |   2     | 20600 |      37     |    1    |     3     |
| ...
| 12/31/2016 |   10    | 19800 |      52     |    1    |     2     |

I need to forecast for the next 10 days' sales. In this example, I will need to forecast the store sales from 01-01-2017 to 01-10-2017. I know how to use other time series model or regression model to solve this problem, but I want to know if RNN-LSTM is a good candidate for it.

I started by taking only storeID=1 data to test the LSTM. If my data only have Date and Sales. I will construct my trainX and trainY in this way (please correct me if I'm wrong):

Window = 20
Horizon = 10

|         trainX                  |          trainY              |
| [Yt-10, Yt-11, Yt-12,...,Yt-29] | [Yt, Yt-1, Yt-2,...,Yt-9]    |
| [Yt-11, Yt-12, Yt-13,...,Yt-30] | [Yt-2, Yt-3, Yt-4,...,Yt-10] |
| [Yt-12, Yt-13, Yt-14,...,Yt-31] | [Yt-3, Yt-4, Yt-5,...,Yt-11] |
...

After reshaping the two

trainX.shape
(300, 1, 20)
trainY.shape
(300, 10)

Question1: In this case, [samples, time steps, features] = [300, 1, 20]. Is this right? Or should I construct the sample as [300, 20, 1] ?

Question2: I do want to use other information in the raw data like Temperature, StoreType, etc. How should I construct my input data for LSTM?

Question3: So far we only discussed 1 store forecast, if I want to forecast for all the stores, how should I construct my input data then?

Currently I'm flowing examples from here, but it seems not sufficient to cover the scenario that I have. I really appreciate for your help!

Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
Yijing Chen
  • 181
  • 1
  • 4
  • I am struggling with same issue at this moment, let me know if you any further info about that, thanks Chen-) – joe Feb 05 '18 at 08:51

1 Answers1

15

I was recently solving similiar problem. In your case:

  1. Input should have shape (300, 20, 1) - because you have time sequences of length 20 with 1 feature.

  2. You may do it like this:

    sequential_input = Input(shape=(20, 1))
    feature_input = Input(shape=(feature_nb,))
    lstm_layer = LSTM(lstm_units_1st_layer, return_sequences=True)(sequential_input)
    lstm_layer = LSTM(lstm_units_2nd_layer, return_sequences=True)(lstm_layer)
    ...
    lstm_layer = LSTM(lstm_units_nth_layer, return_sequences=False)(lstm_layer)
    merged = merge([lstm_layer, feature_input], mode='concat')
    blend = Dense(blending_units_1st_layer, activation='relu')(merged)
    blend = Dense(blending_units_2nd_layer, activation='relu')(blend)
    ...
    output = Dense(10)(blend)
    
  3. This is the hardest part. I do not advise you to predict multiple shops by feeding them to a network as one feature vector. You may under simply skip this part and try to predict different shops using one model or postprocess output using e.g. some kind of graphical models or PCA on matrix where rows are day sales.

UPDATE:

In order to deal with multiple sequential features you could do the following thing:

    sequential_input = Input(shape=(20, nb_of_sequental_features))
    feature_input = Input(shape=(feature_nb,))
    lstm_layer = LSTM(lstm_units_1st_layer, return_sequences=True)(sequential_input)
    lstm_layer = LSTM(lstm_units_2nd_layer, return_sequences=True)(lstm_layer)
    ...
    lstm_layer = LSTM(lstm_units_nth_layer, return_sequences=False)(lstm_layer)
    merged = merge([lstm_layer, feature_input], mode='concat')
    blend = Dense(blending_units_1st_layer, activation='relu')(merged)
    blend = Dense(blending_units_2nd_layer, activation='relu')(blend)
    ...
    output = Dense(10)(blend)
    model = Model(input=[sequential_input, feature_input], output=output])

In this case your input should consist of list of two tables: [sequential_data, features] where sequential_data.shape = (nb_of_examples, timesteps, sequential_features) and features.shape = (nb_of_examples, feature_nb). So sales or temperature should be stored in sequential_features and store_type in features.

Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
  • Thanks a lot for the answers! For #1, I tried the option with (300, 20, 1) and the accuracy is a lot better then before!(very slow though) For #2, I'm still very confused about the answer. Are we merging one LSTM layer with one Feature layer? Is feature layer another lstm layer? OR are you suggesting that we should treat each X as a seperate sequence and create one LSTM for each X sequesnce? And n=3? Why we need multiple dense layer in the end? If you could explain the logic, that would be great! Really appreciate again!!! – Yijing Chen Mar 05 '17 at 23:54
  • Ok. I finally understood - the weather information is also sequential, yes (you have a sequence of weathers - don't you?) – Marcin Możejko Mar 06 '17 at 13:17
  • 1
    Yes. The 'Temperature' column is a time based sequence. 'Open' and 'Store type' column are more like categorical feature of the store status. I guess I still not sure how should I add these extra features in the LSTM model. My guess is we can add features in the 3d tensor [samples, time steps, features], or create multiple layer for different sequence and merge them together? – Yijing Chen Mar 06 '17 at 22:39
  • Thanks a lot for updating the answer!! Just want to make sure that I understand. You are using LSTM to process all the sequential_data (like sales, temp, holiday). After done with that, we merge what we learned from the sequence with the features that are not time based. In the end we are using multiple Dense layer to process all the information together like a fully connected feedforward net. Is this the same as building one RNN model for sequence data, and then use the output with extra info to build another CNN model? – Yijing Chen Mar 09 '17 at 22:50
  • How should the input data be constructed? Does it have to be prepared as a timestep series as well? – user2340706 Oct 27 '17 at 18:54
  • @MarcinMożejko regarding 3. Why is it not advised to combine sales data from different stores?. Is it because in this case there would are multiple series within in the same date range (1/1/16 to 12/31/16)? . what if I create an input of shape (600, 20, 1) with first 300 for store 1 and next 300 for store 2. Will this confuse the model since after 300th entry the time jumps back to 1/1/16?. I couldn't quite understand when you said "predict different shops using one model", why do we need to predict the shop if we are able to predict the series correctly. – vishnu viswanath Nov 12 '17 at 05:01
  • @YijingChen What strategy did you end up using for #2 and #3? In terms of multiple feautres and groups (stores). – panacherie Mar 23 '23 at 23:03