34

I have a pandas data frame with date column, and I am trying to add a new column of boolean values indicating whether a given date is a holiday or not.

Following is the code, but it does not work (all the values are False) because the types seem to be different, and I can't figure out how to get the 'date' in the pandas data frame to be of the same type as the holidays:

cal = USFederalHolidayCalendar()
holidays = cal.holidays(start=train_df['date'].min(),
                        end=train_df['date'].max()).to_pydatetime()
train_df['holiday'] = train_df['date'].isin(holidays)
print type(train_df['date'][1])
print type(holidays[0])
Gopala
  • 10,363
  • 7
  • 45
  • 77

2 Answers2

52

You don't need to convert anything. Just compare straight up. pandas is smart enough to compare a lot of different types with regards to dates and times. You have to have a slightly more esoteric format if you're having issues with date/time compatibility.

import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar as calendar

dr = pd.date_range(start='2015-07-01', end='2015-07-31')
df = pd.DataFrame()
df['Date'] = dr

cal = calendar()
holidays = cal.holidays(start=dr.min(), end=dr.max())

df['Holiday'] = df['Date'].isin(holidays)
print df

Result:

         Date Holiday
0  2015-07-01   False
1  2015-07-02   False
2  2015-07-03    True
3  2015-07-04   False
4  2015-07-05   False
5  2015-07-06   False
6  2015-07-07   False
7  2015-07-08   False
8  2015-07-09   False
9  2015-07-10   False
10 2015-07-11   False
11 2015-07-12   False
12 2015-07-13   False
13 2015-07-14   False
14 2015-07-15   False
15 2015-07-16   False
16 2015-07-17   False
17 2015-07-18   False
18 2015-07-19   False
19 2015-07-20   False
20 2015-07-21   False
21 2015-07-22   False
22 2015-07-23   False
23 2015-07-24   False
24 2015-07-25   False
25 2015-07-26   False
26 2015-07-27   False
27 2015-07-28   False
28 2015-07-29   False
29 2015-07-30   False
30 2015-07-31   False

Note that July 4, 2015 falls on a Saturday.

WGS
  • 13,969
  • 4
  • 48
  • 51
8

I had the same problem as the author, and the other fix provided didn't work for me. This is what did work:

train_df['holiday'] = train_df['date'].dt.date.astype('datetime64').isin(holidays)
  • 1
    As I said, a `slightly more esoteric format`. If you needed to convert to `datetime64`, then your `date` column has a format that `pandas` failed to identify straight up. For cases like this, a conversion definitely is necessary. Regarding OP's request, he converted his dates into "proper" Python data type via `to_pydatetime()`, which `pandas` can convert with nary an issue. :) Still, +1 for addressing a case I didn't bother catching. – WGS Jul 15 '16 at 04:52
  • 1
    Not sure if it was because my 'date' column also had times in it, but I had to use this solution instead of the answer above. Thanks MH. – elPastor Nov 30 '16 at 21:49