1

Using a data set, I wanted to reindex the below around a center date using pandas. For instance, on the first data set I would like to generate an index around 12/22/2009 and give it the value 0 and anything before would be -1,-2,-3,-4 etc and after would be 1,2,3,4,5. Does anyone have any suggestions how to manage the reindexing dynamically? I anticipate having to merge the data sets together using df.merge(how = 'outer') and add an additional column to average the values.

Theoretical Data Set

Reindex to below

enter image description here

Merge and return centered around 0

user3757405
  • 51
  • 1
  • 8

1 Answers1

1

Is it what you expect?

Input data:

>>> df1
         date     value
0  2009-12-14  4.829653
1  2009-12-15  4.941522
2  2009-12-16  4.066554
3  2009-12-17  4.374824
4  2009-12-18  4.919813
5  2009-12-21  4.774057
6  2009-12-22  4.431368
7  2009-12-23  4.794788
8  2009-12-24  4.496606
9  2009-12-25  4.408936
10 2009-12-28  4.467351
11 2009-12-29  4.834504

>>> df2
         date     value
0  2009-11-24  4.852626
1  2009-11-25  4.785530
2  2009-11-26  4.676981
3  2009-11-27  4.030883
4  2009-11-30  4.563335
5  2009-12-01  4.226778
6  2009-12-02  4.073694
7  2009-12-03  4.715326
8  2009-12-04  4.709705
9  2009-12-07  4.469377
10 2009-12-08  4.727273
11 2009-12-09  4.923308
12 2009-12-10  4.742808
13 2009-12-11  4.168057

>>> pivot_dates
['2009-12-21','2009-12-04']
dfs = [df1[::-1].set_index(df1.index - df1['date'].eq(pivot_dates[0]).argmax() - 1)['value'],
       df2[::-1].set_index(df2.index - df2['date'].eq(pivot_dates[1]).argmax() - 1)['value']]

df = pd.concat(dfs, axis='columns')
df.columns = ['A', 'B']

Output result:

>>> df
           A         B
-9       NaN  4.168057
-8       NaN  4.742808
-7       NaN  4.923308
-6  4.834504  4.727273
-5  4.467351  4.469377
-4  4.408936  4.709705
-3  4.496606  4.715326
-2  4.794788  4.073694
-1  4.431368  4.226778
0   4.774057  4.563335
1   4.919813  4.030883
2   4.374824  4.676981
3   4.066554  4.785530
4   4.941522  4.852626
5   4.829653       NaN
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • yes but the date indexes are all different, like df2 could be in june and df3 could be in may. So i would have a list of dates to center 0 around it – user3757405 Aug 14 '21 at 23:15
  • Additionally because they are business days df.index - pd.to_datetime wouldn't work because of the inherent gaps of weekends and holidays – user3757405 Aug 14 '21 at 23:17
  • 1
    To find the center, do you take the min and the max dates and find the middle date? – Corralien Aug 14 '21 at 23:18
  • No, they are specific dates. The example I provided was a hypothetical one. It would be a seperate data frame. DF = ['2009-12-21','2009-12-04','2009-11-11','2009-10-23'] in my example – user3757405 Aug 14 '21 at 23:21
  • 1
    I updated my answer. I think It's right now (I hope). I used your 2 first dataframes with business days (`bdate_range`) to get same gaps. By default, the index of a dataframe is a `RangeIndex`. I just reversed and shifted the index. – Corralien Aug 14 '21 at 23:51