Select data, resample and join horizontally in Python

Question

I have two dataframes df1 and df2.

df1:

    Id  Date         Remark 
0   28  2010-04-08   xx
1   29  2010-10-10   yy
2   30  2012-12-03   zz
3   31  2010-03-16   aa

df2:

        Id  Timestamp                Value  Site
0       28  2010-04-08 13:20:15.120  125.0  93
1       28  2010-04-08 13:20:16.020  120.0  94
2       28  2010-04-08 13:20:18.020  135.0  95
3       28  2010-04-08 13:20:18.360  140.0  96
                   ...
1000    29  2010-06-15 05:04:15.120  16.0   101
1001    29  2010-06-15 05:05:16.320  14.0   101
                   ...

I would like to select all Value data 10 days before/including the Date in df1 from df2 for the same Id. For example, for Id 28, Date is 2010-04-08, so select Value where Timestamp is between 2010-03-30 00:00:00 and 2010-04-08 23:59:59(inclusive).

Then, I want to resample the Value data using forward fill ffill and backward fill bfill at 1min frequency so that there will be 10 x 24 x 60 = 14400 values exactly for each Id.

Lastly, I'd like to rearrange the dataframe horizontally with transpose.

Expected output looks like this:

    Id  Date         value1  value2 ...  value14399  value14400  Remark 
0   28  2010-04-08   125.0   125.0       ...         ...         xx     (value1 and value2 and following values before "2010-04-08 13:20:15.120" are 125.0 as a result of backward fill since the first value for Id 29 is 125.0)
1   29  2010-10-10   16.0    16.0        ...                     yy
        ...

I am not sure what's the best way to approach this problem since I'm adding "another time series dimension" to the dataframe. Any idea is appreciated.

Would you mind explaining the problem on a higher level so I can figure out if there's an alternative to this approach? — The Singularity, Sep 13 '21 at 05:27
Hi, Luke, sure, the goal here is to prepare data for a classification problem. I'm joining `Remark` which are the "labels"(dependent variable) to the time series data (independent variable), to make each row a sample with a unique `Id` followed by the corresponding time series according to the time (`Date`) of the event (`Id`) happened. I hope that helps to make things clearer ... — nilsinelabore, Sep 13 '21 at 06:04
Could try a rolling between df1 and df2 with `pandas.merge_asof` using `Id` as the joining key and setting a timedelta of 10days for the `tolerance` parameter. Of course this doesn't solve everything but perhaps just a nudge in the right direction of the first few steps to achieve your goal https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.merge_asof.html — kelvt, Sep 13 '21 at 06:55

Select data, resample and join horizontally in Python

0 Answers0