1

I tried to use isin() to filter datetime column in my df. Found the following strange behavior:

Let's define a dataframe with a sole date value:

>>>from datetime import datetime
>>>date = datetime.fromisoformat('2011-11-04')
>>>df = pd.DataFrame({'date': [date]})
>>>print(df)
        date
0 2011-11-04

We set a date range of 2 days:

>>>date_rg = pd.date_range(end='2011-11-04', periods=2)

The date is expected to be in the range:

>>>date in date_rg
True

However, if we use isin(), strange things happen!

>>>df.date.dt.date.isin(date_rg)
0    False
Name: date, dtype: bool

With further investigation, I find even stranger behavior:

>>>for i in range(1,10):
>>>    date_rg = pd.date_range(end='2011-11-04', periods=i)
>>>    print('i =', i)
>>>    print(df.date.dt.date.isin(date_rg))

i = 1
0    False
Name: date, dtype: bool
i = 2
0    False
Name: date, dtype: bool
i = 3
0    True
Name: date, dtype: bool
i = 4
0    False
Name: date, dtype: bool
i = 5
0    True
Name: date, dtype: bool
i = 6
0    True
Name: date, dtype: bool
i = 7
0    True
Name: date, dtype: bool
i = 8
0    True
Name: date, dtype: bool
i = 9
0    True
Name: date, dtype: bool

Only when period = 1, 2 and 4 days, it returns False! I strongly suspect it's a bug of Pandas lib. I'm using pandas:1.0.5 with numpy:1.19.0.

By the way, we are able to contour this by using date_range.date:

>>>df.date.dt.date.isin(date_rg.date)
0    True
Name: date, dtype: bool

*Related:
isin-function-does-not-work-for-dates.
Issue 5021.

Lucecpkn
  • 971
  • 6
  • 9
  • As per the duplicate, there used to be an error message in `0.25` that stated things would not necessrily compare equal. `datetime.date` and `np.timedelta64[ns]` are different objects. Though they should represent the same thing, there's no reason they should be equal. Compare `datetime64` with `datetime64` or `datetime.date` with `datetime.date` and all comparisons work properly. – ALollz Jul 16 '20 at 16:56
  • Thank you @ALollz for pointing out the root cause! It's still really confusing though: e.g. in my last example with different periods, I would expect a consistent False since they are different types, as you point out. But why are they "randomly" True in some cases? Just my uninformed guess: Perhaps it's because the `datetime.date` and `datetime.date` objects are very close in their underlying lower-level encodings, so sometimes their encodings of the same date overlap? – Lucecpkn Jul 16 '20 at 20:43

0 Answers0