0

Why pandas does not let me make a join on datetimeindex? Here is a small code block to produce the issue that i am having.

dti = pd.date_range('2019-01-01','2020-01-01')
df1 = pd.DataFrame({'date':dti})
df2 = pd.DataFrame({'date':dti})
df1.join(df2,on="date")

I tried to convert columns to datetime object first but still no luck.

dti = pd.date_range('2019-01-01','2020-01-01')
dto = pd.to_datetime(dti)
df1 = pd.DataFrame({'date':dto})
df2 = pd.DataFrame({'date':dto})
df1.join(df2,on="date")

Both of them are failing with below exception.

ValueError: You are trying to merge on datetime64[ns] and int64 columns. If you wish to proceed you should use pd.concat
moth
  • 427
  • 1
  • 4
  • 18

1 Answers1

1

It is a known Pandas issue: Bug . Until fixed, won’t merge do the job:

import pandas as pd

dti = pd.date_range('2019-01-01','2020-01-01')
dto = pd.to_datetime(dti)
df1 = pd.DataFrame({'date':dto})
df2 = pd.DataFrame({'date':dto})

df1.merge(df2,on="date", how='inner')

I believe this is a bug in Pandas. Somehow join behave as if one of your date is not a date data type.

Prayson W. Daniel
  • 14,191
  • 4
  • 51
  • 57
  • It's not a bug. It's supposed to only join on the index for `other` and the index for `df2` happens to be `df2.index.dtype # dtype('int64')`. It can be a column _or_ index in the calling DataFrame, but can only be the index in the `other` `DataFrame`. – Darkonaut Dec 15 '19 at 01:02
  • There are some examples in the [docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html). But you're right with using `.merge()` here. Another option would be `df1.join(df2.set_index('date'), on='date')`. – Darkonaut Dec 15 '19 at 13:01