I have generated this df
PredictionTargetDateEOM PredictionTargetDateBOM DayAfterTargetDateEOM business_days
0 2018-12-31 2018-12-01 2019-01-01 20
1 2019-01-31 2019-01-01 2019-02-01 21
2 2019-02-28 2019-02-01 2019-03-01 20
3 2018-11-30 2018-11-01 2018-12-01 21
4 2018-10-31 2018-10-01 2018-11-01 23
... ... ... ... ...
172422 2020-10-31 2020-10-01 2020-11-01 22
172423 2020-11-30 2020-11-01 2020-12-01 20
172424 2020-12-31 2020-12-01 2021-01-01 22
172425 2020-09-30 2020-09-01 2020-10-01 21
172426 2020-08-31 2020-08-01 2020-09-01 21
with this code:
predicted_df['PredictionTargetDateBOM'] = predicted_df.apply(lambda x: pd.to_datetime(x['PredictionTargetDateEOM']).replace(day=1), axis = 1) #Get first day of the target month
predicted_df['PredictionTargetDateEOM'] = pd.to_datetime(predicted_df['PredictionTargetDateEOM'])
predicted_df['DayAfterTargetDateEOM'] = predicted_df['PredictionTargetDateEOM'] + timedelta(days=1) #Get the first day of the month after target month. i.e. M+2
predicted_df['business_days_bankers'] = predicted_df.apply(lambda x: np.busday_count(x['PredictionTargetDateBOM'].date(), x['DayAfterTargetDateEOM'].date(), holidays=[list(holidays.US(years=x['PredictionTargetDateBOM'].year).keys())[index] for index in [list(holidays.US(years=x['PredictionTargetDateBOM'].year).values()).index(item) for item in rocket_holiday_including_observed if item in list(holidays.US(years=x['PredictionTargetDateBOM'].year).values())]] ), axis = 1) #Count number of business days of the target month
That counts the number of business days in the month of the PredictionTargetDateEOM
column based on Python's holiday
package, which is a dictionary that includes the following holidays:
2022-01-01 New Year's Day
2022-01-17 Martin Luther King Jr. Day
2022-02-21 Washington's Birthday
2022-05-30 Memorial Day
2022-06-19 Juneteenth National Independence Day
2022-06-20 Juneteenth National Independence Day (Observed)
2022-07-04 Independence Day
2022-09-05 Labor Day
2022-10-10 Columbus Day
2022-11-11 Veterans Day
2022-11-24 Thanksgiving
2022-12-25 Christmas Day
2022-12-26 Christmas Day (Observed)
However, I would like to replicate the business day count but instead use this list called rocket_holiday
as the reference for np.busday_count()
:
["New Year's Day",
'Martin Luther King Jr. Day',
'Memorial Day',
'Independence Day',
'Labor Day',
'Thanksgiving',
'Christmas Day',
"New Year's Day (Observed)",
'Martin Luther King Jr. Day (Observed)',
'Memorial Day (Observed)',
'Independence Day (Observed)',
'Labor Day (Observed)',
'Thanksgiving (Observed)',
'Christmas Day (Observed)']
So I've added this line
predicted_df['business_days_rocket'] = predicted_df.apply(lambda x: np.busday_count(x['PredictionTargetDateBOM'].date(), x['DayAfterTargetDateEOM'].date(), holidays=[rocket_holiday]), axis = 1)
But I get the ValueError listed in the title of this question. I think the problem is that the first list is a dictionary with the dates of those holidays, so I need to write a function that could generate those dates for the holidays of the second list in a dynamic fashion based on year, and convert that list into a dictionary. Is there a way to do that with Python's holiday
package so that I don't have to hard-code the dates in?