I have a df in R with 6000 rows and 90 variables. I want to predict the sales volume of product A for the next 12 months based on data about product A as well as competing products (B,C,D,...). To use an LSTM, I need to reshape the df into 3D format (samples, timesteps, features) but I don't quite get how.
My df looks somewhat like this:
Date | Product | Sales | X1 | X2 | X3 | ... | X87 |
---|---|---|---|---|---|---|---|
2017-01-01 | A | 0.65438 | 0.45438 | -1.2670 | 0.3215 | ... | 1.35623 |
2017-01-01 | B | -0.55468 | 0.12436 | -1.5677 | -0.3215 | ... | 1.35623 |
2017-01-01 | C | 0.65981 | 1.12345 | -0.5574 | 0.3215 | ... | 1.35623 |
2017-02-01 | A | -0.12338 | -1.12345 | 0.4543 | -1.5673 | ... | 0.42961 |
-------- | -------- | -------- | --------- | -------- | --------- | ------ | --------- |
2022-12-01 | C | 0.34568 | 1.134598 | 0.5678 | -1.2648 | ... | 0.34675 |
So far, I have split the data into train and test set and normalized. Then I ran:
# Reshape to 3-dimensional array
train_data_lstm <- train_data %>%
as.matrix() %>%
array(dim = c(nrow(train_data), ncol(train_data), 1))
test_data_lstm <- test_data %>%
as.matrix() %>%
array(dim = c(nrow(test_data), ncol(test_data), 1))
# Prepare sequences
lookback <- 12 # how many steps back the model should look
- How do I continue from here?
- Do I need to remove the Date variable? What about the product labels?
Thank you in advance!