I tried to use isin()
to filter datetime column in my df. Found the following strange behavior:
Let's define a dataframe with a sole date value:
>>>from datetime import datetime
>>>date = datetime.fromisoformat('2011-11-04')
>>>df = pd.DataFrame({'date': [date]})
>>>print(df)
date
0 2011-11-04
We set a date range of 2 days:
>>>date_rg = pd.date_range(end='2011-11-04', periods=2)
The date is expected to be in the range:
>>>date in date_rg
True
However, if we use isin()
, strange things happen!
>>>df.date.dt.date.isin(date_rg)
0 False
Name: date, dtype: bool
With further investigation, I find even stranger behavior:
>>>for i in range(1,10):
>>> date_rg = pd.date_range(end='2011-11-04', periods=i)
>>> print('i =', i)
>>> print(df.date.dt.date.isin(date_rg))
i = 1
0 False
Name: date, dtype: bool
i = 2
0 False
Name: date, dtype: bool
i = 3
0 True
Name: date, dtype: bool
i = 4
0 False
Name: date, dtype: bool
i = 5
0 True
Name: date, dtype: bool
i = 6
0 True
Name: date, dtype: bool
i = 7
0 True
Name: date, dtype: bool
i = 8
0 True
Name: date, dtype: bool
i = 9
0 True
Name: date, dtype: bool
Only when period = 1, 2 and 4 days, it returns False!
I strongly suspect it's a bug of Pandas lib.
I'm using pandas:1.0.5
with numpy:1.19.0
.
By the way, we are able to contour this by using date_range.date
:
>>>df.date.dt.date.isin(date_rg.date)
0 True
Name: date, dtype: bool
*Related:
isin-function-does-not-work-for-dates.
Issue 5021.