0

I have a numpy array of some 5000 rows and 4 columns (temp, pressure, speed, cost). So this is of the shape (5000, 4). Each row is an observation at a regular interval this is the first time i'm doing time series prediction and I'm stuck on input shape. I'm trying to predict a value 1 timestep from the last data point. How do I reshape it into the 3D form for LSTM model in keras?

Also It will be much more helpful if a small sample program is written. There doesn't seem to be any example/tutorial where the input has more than one feature (and also not NLP).

user2559578
  • 143
  • 1
  • 10
  • Are all four observations input variables or is one of them an output? – DJK Dec 22 '17 at 18:10
  • There are 4 columns (temp, pressure, speed, cost) and I want to predict a future value of one of them (mostly the first column, temp) using the past values of all the columns including the temp, if it makes sense. – user2559578 Dec 22 '17 at 18:32
  • Look at [this post](https://stackoverflow.com/questions/45764629/machine-learning-how-to-use-the-past-20-rows-as-an-input-for-x-for-each-y-value/45765082#45765082) first and see if it helps to understand how to reshape the data for an LSTM, if it does i could answer, but the questions are pretty similar – DJK Dec 22 '17 at 19:01

1 Answers1

2

The first question you should ask yourself is :

  • What is the timescale in which the input features encode relevant information for the value you want to predict?

Let's call this timescale prediction_context.

You can now create your dataset :

import numpy as np

recording_length = 5000
n_features = 4
prediction_context = 10  # Change here
# The data you already have
X_data = np.random.random((recording_length, n_features))
to_predict = np.random.random((5000,1))
# Make lists of training examples
X_in = []
Y_out = []
# Append examples to the lists (input and expected output)
for i in range(recording_length - prediction_context):
    X_in.append(X_data[i:i+prediction_context,:])
    Y_out.append(to_predict[i+prediction_context])

# Convert them to numpy array
X_train = np.array(X_in)
Y_train = np.array(Y_out)

At the end :
X_train.shape = (recording_length - prediction_context, prediction_context, n_features)
So you will need to make a trade-off between the length of your prediction context and the number of examples you will have to train your network.

mpariente
  • 660
  • 4
  • 12
  • The data points (instances) are observed at 15 minute intervals, if that's what you mean by timescale? And the y_data is basically one of the columns in the dataset. I have four columns and I want to predict the future values of one of the columns (say, temp) using the past values of that (temp) column and also other columns(pressure, speed, cost). – user2559578 Dec 22 '17 at 19:54
  • and I also want to understand how to reshape the 2D array to 3D for LSTM. – user2559578 Dec 22 '17 at 20:30