Panel data in Keras LSTM

Question

I am looking at panel data, which is structured like this:

D = \{(x^{(k)}_{t},y^{(k)}_{t})\,|\, k=1,\dots,N\, , t=t_0,\dots,t_k \}_{k=1}^{N}

where x^{(k)} denotes the k'th sequence, x^{(k)}_{t} denotes the k'th sequences value at time t , furthermore x^{(k)}_{i,t} is the i'th entry in the vector x^{(k)}_{t}. That is x^{(k)}_{t} is the feature vector of the k'th sequence at time t. The sub- and super scripts mean the same things for the label data y^{(k)}_{t}, but here y^{(k)}_{t} \in \{0,1\}.

In plain words: The data set contains individuals observed over time, and for each time point at which an individual is observed, it is recorded whether he bought an item or not ( y\in \{0,1\}).

I would like to use a recurrent neural network with LSTM units from Keras for the task of predicting whether a person will buy an item or not, at a given time point. I have only been able to find examples of RNN's where each sequence has a label value (philipperemy link), not an example where each sequence element has a label value as in the problem I described.

My approach so far, has been to create a tensor with dimensions (samples,timesteps,features) but I cannot figure out how to format the labels, such that keras can match them with the features. It should be something like this (samples,timesteps,1), where the last dimension indicates a single dimension to contain the label value of 0 or 1.

Furthermore some of the approaches that I have come across splits sequences such that subsequences are add to the training data, thus increasing the need for memory tremendously (mlmastery link). This is infeasible in my case, as I have multiple GB's of data, and I would not be able to store it in memory if I added subsequences.

The model I would like to use is something like this:

mod = Sequential()
mod.add(LSTM(30,input_dim=116, return_sequences = True))
mod.add(LSTM(10))
mod.add(Dense(2))

Does anyone have experience working with panel data in keras?

Math mode doesnt seem work, I followed this tutorial: http://meta.math.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference — Math_kv, Mar 09 '17 at 11:57
I am wondering if you are still on stackoverflow and if you would mind posting your data and full model? I am trying to learn keras for panel and my data is similar to yours, but there is not much out there for panel keras examples. — John Stud, Feb 02 '19 at 02:44
Hi John, unfortunately I don't have access to the data or the model anymore. — Math_kv, Feb 07 '19 at 12:32

score 5 · Accepted Answer · answered Mar 10 '17 at 11:43

5

Try:

mod = Sequential()
mod.add(LSTM(30, input_shape=(timesteps, features), return_sequences = True))
mod.add(LSTM(10, return_sequences = True))
mod.add(TimeDistributed(Dense(1, activation='sigmoid')))
# In newest Keras version you can change the line above to mod.add(Dense(1, ..))

mod.compile(loss='binary_crossentropy', optimizer='rmsprop')

answered Mar 10 '17 at 11:43

Marcin Możejko

39,542
10
109
120

1

Does it matter what batch size you use for panel data? Can the batch size be more than 1 individual? – gannawag Aug 07 '17 at 14:44

Can · Answer 2 · 2018-05-29T21:34:08.540

0

It looks like the only option is to run the lstm for each individual (here it is a sequence) separately when the data is not balanced as I assume this since time depends on k in your question.

edited May 29 '18 at 21:34

answered May 29 '18 at 19:30

Can

1
1

Panel data in Keras LSTM

2 Answers2