1

This question is conceptually similar to this one.

I've got two DataFrames. One has a set of values corresponding to certain times and dates (df_1). The other has a set of values corresponding to certain months (df_2), specified as the first instant of the first day of a month (e.g. 2015-07-01 00:00:00.00). I want to merge these DataFrames such that the values of df_2 for months get applied to all dates and times of df_1 for the corresponding months.

So, here is df_1:

|DatetimeIndex|value_1|
|-------------|-------|
|2015-07-18   |10     |
|2015-07-18   |11     |
|2015-07-19   |12     |
|2015-07-20   |13     |
|2015-07-20   |14     |
|2015-07-20   |15     |
|2015-07-21   |16     |
|2015-07-22   |17     |
|2015-07-22   |18     |
|2015-07-23   |19     |
|2015-08-11   |20     |

and here is df_2:

|DatetimeIndex|value_2|
|-------------|-------|
|2015-07-01   |100    |
|2015-08-01   |200    |

I want to merge them like this:

|DatetimeIndex|value_1|value_2|
|-------------|-------|-------|
|2015-07-18   |10     |100    |
|2015-07-18   |11     |100    |
|2015-07-19   |12     |100    |
|2015-07-20   |13     |100    |
|2015-07-20   |14     |100    |
|2015-07-20   |15     |100    |
|2015-07-21   |16     |100    |
|2015-07-22   |17     |100    |
|2015-07-22   |18     |100    |
|2015-07-23   |19     |100    |
|2015-08-11   |20     |200    |

So, the value_2 exists throughout each of the months.

How can this merge be done?

From that previous answer, a solution of the following (flawed) form seems like the right approach, but it fails:

idx = df_1.index.union(df_2.index)
#df_1.join(df_2.loc[idx.date].set_index(idx), how = "outer")
df_1.join(df_2.loc[idx.month].set_index(idx), how = "outer")
BlandCorporation
  • 1,324
  • 1
  • 15
  • 33

2 Answers2

2

just use merge

      pd.merge(df_1.reset_index(), df_2, left_on=[df_1.index.year, df_1.index.month], 
      right_on=[df_2.index.year, df_2.index.month])
Steven G
  • 16,244
  • 8
  • 53
  • 77
  • This works for the sample, but to the extent the data crosses multiple years, it will not – chrisb Oct 10 '17 at 17:10
  • @StevenG Hey, thanks for your solution on that. I hadn't seen lists being used for the `left_on` and `right_on` parameters before. It's cool. – BlandCorporation Oct 11 '17 at 15:37
2

Option 1
pd.merge_asof

pd.merge_asof(df_1, df_2, left_index=True, right_index=True)

               value_1  value_2
DatetimeIndex                  
2015-07-18          10      100
2015-07-18          11      100
2015-07-19          12      100
2015-07-20          13      100
2015-07-20          14      100
2015-07-20          15      100
2015-07-21          16      100
2015-07-22          17      100
2015-07-22          18      100
2015-07-23          19      100
2015-08-11          20      200

Option 2
index manipulation

df_1.set_index(
    df_1.index - pd.offsets.MonthBegin()
).join(df_2).set_index(df_1.index)

               value_1  value_2
DatetimeIndex                  
2015-07-18          10      100
2015-07-18          11      100
2015-07-19          12      100
2015-07-20          13      100
2015-07-20          14      100
2015-07-20          15      100
2015-07-21          16      100
2015-07-22          17      100
2015-07-22          18      100
2015-07-23          19      100
2015-08-11          20      200
piRSquared
  • 285,575
  • 57
  • 475
  • 624