0

I have two dataframes df1 and df2.

df1:

    Id  Date         Remark 
0   28  2010-04-08   xx
1   29  2010-10-10   yy
2   30  2012-12-03   zz
3   31  2010-03-16   aa

df2:

        Id  Timestamp                Value  Site
0       28  2010-04-08 13:20:15.120  125.0  93
1       28  2010-04-08 13:20:16.020  120.0  94
2       28  2010-04-08 13:20:18.020  135.0  95
3       28  2010-04-08 13:20:18.360  140.0  96
                   ...
1000    29  2010-06-15 05:04:15.120  16.0   101
1001    29  2010-06-15 05:05:16.320  14.0   101
                   ...

I would like to select all Value data 10 days before/including the Date in df1 from df2 for the same Id. For example, for Id 28, Date is 2010-04-08, so select Value where Timestamp is between 2010-03-30 00:00:00 and 2010-04-08 23:59:59(inclusive).

Then, I want to resample the Value data using forward fill ffill and backward fill bfill at 1min frequency so that there will be 10 x 24 x 60 = 14400 values exactly for each Id.

Lastly, I'd like to rearrange the dataframe horizontally with transpose.

Expected output looks like this:

    Id  Date         value1  value2 ...  value14399  value14400  Remark 
0   28  2010-04-08   125.0   125.0       ...         ...         xx     (value1 and value2 and following values before "2010-04-08 13:20:15.120" are 125.0 as a result of backward fill since the first value for Id 29 is 125.0)
1   29  2010-10-10   16.0    16.0        ...                     yy
        ...

I am not sure what's the best way to approach this problem since I'm adding "another time series dimension" to the dataframe. Any idea is appreciated.

nilsinelabore
  • 4,143
  • 17
  • 65
  • 122
  • Would you mind explaining the problem on a higher level so I can figure out if there's an alternative to this approach? – The Singularity Sep 13 '21 at 05:27
  • Hi, Luke, sure, the goal here is to prepare data for a classification problem. I'm joining `Remark` which are the "labels"(dependent variable) to the time series data (independent variable), to make each row a sample with a unique `Id` followed by the corresponding time series according to the time (`Date`) of the event (`Id`) happened. I hope that helps to make things clearer ... – nilsinelabore Sep 13 '21 at 06:04
  • 1
    Could try a rolling between df1 and df2 with `pandas.merge_asof` using `Id` as the joining key and setting a timedelta of 10days for the `tolerance` parameter. Of course this doesn't solve everything but perhaps just a nudge in the right direction of the first few steps to achieve your goal https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.merge_asof.html – kelvt Sep 13 '21 at 06:55
  • @kelvt Thank you for the idea, I will have a look! – nilsinelabore Sep 13 '21 at 07:08

0 Answers0