-2

I am beginning with time series classification and have some trouble understanding how my training set should be constructed. My current data look like this:

Timestamp       User ID  Feature 1   Feature 2   ...    Feature N  target
2002-10-30         1        0            0       ...       1        0
2002-10-31         2        0            1       ...       1        0
...
...
2017-10-30         1        0            0       ...       0        1
2017-10-31         2        0            1       ...       0        0

The features are one-hot encoded text features, recorded at time t for a given User ID. The target is an event occurring / not occurring at time t. I am willing to detect this event given a new set of features for all the User IDs of the dataset, at a new given time t.

I understood from this paper that one way to model this is by using a "sliding windows classifier".

For any time t, I could aggregate together the features from t, t-1, ... t-n and set a more flexible target that would be "the event occurred or not at either t, t+1, ... t+n". Is this the correct way to build such a classifier?

I am also considering more recent approaches like "recurrent neural network architectures (LSTM)". How could I build a training set to feed this model from the dataset above?

ps: I plan to use scikit-learn / Keras to build the classifiers.

Thanks in advance for your time and answers.

erup
  • 183
  • 3
  • 12
  • From what I see you have a simple binary classification problem (target is 0 or 1). So you have to find a relationship between input and target. No other pre-processing of data is required. You can use multiple techniques for this: Neural Networks, Genetic Programming etc ... – Mihai Oltean Oct 31 '17 at 07:12

1 Answers1

0

There is few ways as you can work with timeseries:

  1. Straightforward use LSTM with some window, so you data will have shape like this (batch, window, data_features_dimensions...)
  2. You can use Conv1D and other 1D methods, thus you can find some patterns.
  3. You can build matrix from windows. This is not very logically at first sight but in such way you can find some shifted patterns in some way as LSTM do.
  4. You can treat you timeseries as an signal and use same technics like in audio processing, and for example build spectrogram which you can process with CNN
Andrey Nikishaev
  • 3,759
  • 5
  • 40
  • 55