0

I am trying to merge two Pandas DataFrames on DatetimeIndex columns. Each df contains different daily measurement data for an overlapping period, one day with several columns per line. I want to end up with a df merged on the DatetimeIndex with still one day per line ('inner'), and all measurement columns of the original two dfs, but a single column for the common dates. This works with pd.merge() as long as the dates are strings, but when I convert them to datetimes, pd.merge() tells me to use pd.concat() instead. A minimal working example would look like this:

df1 = pd.DataFrame(data=pd.date_range('2022-01-02','2022-01-04'), columns=['dtm'])
df1['var1'] = np.random.random(len(df1))

df2 = pd.DataFrame(data=pd.date_range('2022-01-01','2022-01-05'), columns=['dtm'])
df2['var2'] = np.random.random(len(df2))*100

print(pd.concat([df1,df2], axis=1, join='inner'))

However, this gives me

         dtm      var1        dtm       var2
0 2022-01-02  0.942075 2022-01-01  82.021265
1 2022-01-03  0.857628 2022-01-02  62.548033
2 2022-01-04  0.432774 2022-01-03   3.883448

i.e., there are two dtm columns, and their dates are offset. How can I tell pd.concat() to merge on the dtm columns?

AstroFloyd
  • 405
  • 5
  • 14

1 Answers1

0

Create DatetimeIndex in both DataFrames:

print(pd.concat([df1.set_index('dtm'),df2.set_index('dtm')], axis=1, join='inner'))
                var1       var2
dtm                            
2022-01-02  0.801879  81.125161
2022-01-03  0.473338  89.862602
2022-01-04  0.950536  63.681588

For me merge working correct (columns dtm in both DataFrames are filled by datetimes):

print(df1.merge(df2, on='dtm'))

         dtm      var1       var2
0 2022-01-02  0.875464  11.180980
1 2022-01-03  0.282619  73.407783
2 2022-01-04  0.802322  88.379740
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252