3

I have two columns in pandas dataframe, start date and end date.

I want to know if any holiday is included in the time period of each row.

I want to create a new column to show yes or no.

id  Start Date  End Date
0   2019-09-27  2019-10-06
1   2019-10-09  2019-10-22
2   2019-05-04  2019-05-15
3   2019-09-18  2019-09-29

I know how to check if a specific date is holiday or not

But how can I check the duration of each row?

import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar as calendar

df = pd.DataFrame({'Start Date':['2019-09-27', '2019-10-09', '2019-05-04', '2019-09-18'],
'End Date':['2019-10-06', '2019-10-22', '2019-05-15', '2019-09-29']})

# To check if a specific date is a holiday or not 
holidays = calendar().holidays(start = df['Start Date'].min(), end = df['Start Date'].max())
df['Holiday'] = df['Start Date'].isin(holidays)

# This can only check if the start date is a holiday
id  Start Date  Holiday
0   2019-09-27  False
1   2019-10-09  False
2   2019-05-04  False
3   2019-09-18  False

# But how can I check the duration between df['Start Date'] and df['End Date'] of each row?

I expect that there is another Boolean column to check if each row (id) is including any holiday between the duration of start date and end date

id  Start Date  End Date    Holiday
0   2019-09-27  2019-10-06  True
1   2019-10-09  2019-10-22  False
2   2019-05-04  2019-05-15  True
3   2019-09-18  2019-09-29  False
Jerry
  • 33
  • 5

2 Answers2

4

What I will do

#holidays = calendar().holidays(start = df['Start Date'].min(), end = df['End Date'].max())
l=[any(x<=z and y>=z for z in holidays.tolist()) for x , y in zip(df['Start Date'],df['End Date'])]
[False, True, False, False]
df['Holiday']=l

Also check When should I ever want to use pandas apply() in my code?

BENY
  • 317,841
  • 20
  • 164
  • 234
2

Apply the checking function to each row of the dataframe:

df['Holiday'] = df.apply(lambda x:\
                   calendar().holidays(start=x['Start Date'],
                                       end=x['End Date']).size, axis=1)\
  .astype(bool) # Convert the answer to a boolean
#0    False
#1     True
#2    False
#3    False
DYZ
  • 55,249
  • 10
  • 64
  • 93