0

How can I load a time series such as:

enter image description here

into a neural network (LSTM). So far, I have seen approaches where this matrix is transposed (https://github.com/curiousily/Getting-Things-Done-with-Pytorch/blob/master/06.time-series-anomaly-detection-ecg.ipynb) (hours as columns, devices as rows). Others create a custom dataloader: http://www.feeny.org/custom-pytorch-dataset-class-for-timeseries-sequence-windows/ and manually create windows.

Why is there no more native way for the network to directly work on this raw input and learn the patterns / periodicity / anomalies? How can this multivariate time-series of multiple devices be loaded natively into (pytorch, tensorflow) so that a resulting LSTM would properly learn:

  • the state of an individual time-series (at least within some window, must not necessarily be the whole potentially infinite time-series)

  • but also consider information of multiple series /devices / windows when performing a prediction

    import pandas as pd
      from pandas import Timestamp
      df = pd.DataFrame({'hour': {0: Timestamp('2020-01-01 00:00:00'), 1: Timestamp('2020-01-01 00:00:00'), 2: Timestamp('2020-01-01 00:00:00'), 3: Timestamp('2020-01-01 00:00:00'), 4: Timestamp('2020-01-01 00:00:00'), 5: Timestamp('2020-01-01 01:00:00'), 6: Timestamp('2020-01-01 01:00:00'), 7: Timestamp('2020-01-01 01:00:00'), 8: Timestamp('2020-01-01 01:00:00'), 9: Timestamp('2020-01-01 01:00:00')}, 'metrik_0': {0: 2.020883621337143, 1: 2.808770093182167, 2: 2.5267618429653402, 3: 3.2709845883575346, 4: 3.7984105853602235, 5: 4.0385160093937795, 6: 4.643267594258785, 7: 1.3012379179114388, 8: 3.509304898336378, 9: 2.8664748765561208}, 'metrik_1': {0: 4.580434685779621, 1: 2.933188328317023, 2: 3.999229120882797, 3: 2.9099857745449706, 4: 4.6302055552849, 5: 4.012670194672169, 6: 3.697352153313931, 7: 4.855210603371005, 8: 2.2197913449032254, 9: 2.393605868973481}, 'metrik_2': {0: 3.680527279150989, 1: 2.511065648719921, 2: 3.8350007982479113, 3: 2.4063786290320333, 4: 3.231433617897482, 5: 3.8505378854180115, 6: 5.359150077287063, 7: 2.8966469424805386, 8: 4.554080028058399, 9: 3.3319064764061914}, 'cohort_id': {0: 1, 1: 2, 2: 1, 3: 2, 4: 2, 5: 1, 6: 2, 7: 2, 8: 1, 9: 2}, 'device_id': {0: 1, 1: 3, 2: 4, 3: 2, 4: 5, 5: 4, 6: 3, 7: 2, 8: 1, 9: 5}})
    
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
  • Basically it feels like the smart thing about the network is the data loader (generating the windows) and not the LSTM (when thinking about it in a very simplified way as: some weighted means ;) ). Furthermore: the way I see most examples (iterating in a for loop and adding stuff to a list) feels rather slow & inefficient. So I am really curious to figure out the right - and hopefully a fast way how to do it. – Georg Heiler Nov 12 '20 at 16:16
  • Tensorflow seems to provide a: https://www.tensorflow.org/api_docs/python/tf/data/Dataset#window window operator for a given dataset - is a similar one available in pytorch? So far, I could not find one. – Georg Heiler Nov 12 '20 at 16:22
  • https://stackoverflow.com/questions/57893415/pytorch-dataloader-for-time-series-task already looks much better than a for loop and list - but is not handling the different devices (as outlined above). – Georg Heiler Nov 12 '20 at 16:26
  • I.e. in the domain of NLP / information retrieval, I know that attention (BERT / transformer) are used for a similar problem - and tokens are not passed in character windows. I would want to achieve something similar - i.e. native handling of such time series data. – Georg Heiler Nov 12 '20 at 17:10

1 Answers1

0

TSAI provides a nice function for this purpose https://github.com/timeseriesAI/tsai/blob/62e9348d9e29a6b5f628879bd77056c11db5c0ab/tsai/data/preparation.py#L119

Georg Heiler
  • 16,916
  • 36
  • 162
  • 292