I am trying to merge two Pandas DataFrames on DatetimeIndex columns. Each df contains different daily measurement data for an overlapping period, one day with several columns per line. I want to end up with a df merged on the DatetimeIndex with still one day per line ('inner'
), and all measurement columns of the original two dfs, but a single column for the common dates. This works with pd.merge()
as long as the dates are strings, but when I convert them to datetimes, pd.merge()
tells me to use pd.concat()
instead. A minimal working example would look like this:
df1 = pd.DataFrame(data=pd.date_range('2022-01-02','2022-01-04'), columns=['dtm'])
df1['var1'] = np.random.random(len(df1))
df2 = pd.DataFrame(data=pd.date_range('2022-01-01','2022-01-05'), columns=['dtm'])
df2['var2'] = np.random.random(len(df2))*100
print(pd.concat([df1,df2], axis=1, join='inner'))
However, this gives me
dtm var1 dtm var2
0 2022-01-02 0.942075 2022-01-01 82.021265
1 2022-01-03 0.857628 2022-01-02 62.548033
2 2022-01-04 0.432774 2022-01-03 3.883448
i.e., there are two dtm
columns, and their dates are offset. How can I tell pd.concat()
to merge on the dtm
columns?