I have this df that contains the first and last of the month and number of business days in each month as determined by the Python holidays
library:
PredictionTargetDateEOM PredictionTargetDateBOM DayAfterTargetDateEOM business_days
0 2018-12-31 2018-12-01 2019-01-01 20
1 2019-01-31 2019-01-01 2019-02-01 21
2 2019-02-28 2019-02-01 2019-03-01 20
3 2018-11-30 2018-11-01 2018-12-01 21
4 2018-10-31 2018-10-01 2018-11-01 23
... ... ... ... ...
172422 2020-10-31 2020-10-01 2020-11-01 22
172423 2020-11-30 2020-11-01 2020-12-01 20
172424 2020-12-31 2020-12-01 2021-01-01 22
172425 2020-09-30 2020-09-01 2020-10-01 21
172426 2020-08-31 2020-08-01 2020-09-01 21
Generated with this code:
predicted_df['PredictionTargetDateBOM'] = predicted_df.apply(lambda x: pd.to_datetime(x['PredictionTargetDateEOM']).replace(day=1), axis = 1) #Get first day of the target month
predicted_df['PredictionTargetDateEOM'] = pd.to_datetime(predicted_df['PredictionTargetDateEOM'])
predicted_df['DayAfterTargetDateEOM'] = predicted_df['PredictionTargetDateEOM'] + timedelta(days=1) #Get the first day of the month after target month. i.e. M+2
predicted_df['business_days'] = predicted_df.apply(lambda x: np.busday_count(x['PredictionTargetDateBOM'].date(), x['DayAfterTargetDateEOM'].date(), holidays=[list(holidays.US(years=x['PredictionTargetDateBOM'].year).keys())[index] for index in [list(holidays.US(years=x['PredictionTargetDateBOM'].year).values()).index(item) for item in rocket_holiday_including_observed if item in list(holidays.US(years=x['PredictionTargetDateBOM'].year).values())]] ), axis = 1) #Count number of business days of the target month
I want to add this column:
predicted_df['business_days_rocket'] = predicted_df.apply(lambda x: np.busday_count(x['PredictionTargetDateBOM'].date(), x['DayAfterTargetDateEOM'].date(), holidays=[list({k: v for k, v in holidays.US(years=x['PredictionTargetDateBOM'].year).items() if v in my_set})]), axis = 1)
Based on this as my_set:
my_list = [
"New Year's Day",
"Martin Luther King Jr. Day",
"Memorial Day",
"Independence Day",
"Labor Day",
"Thanksgiving",
"Christmas Day",
"New Year's Day (Observed)",
"Martin Luther King Jr. Day (Observed)",
"Memorial Day (Observed)",
"Independence Day (Observed)",
"Labor Day (Observed)",
"Thanksgiving (Observed)",
"Christmas Day (Observed)",
]
my_set = set(my_list)
But I get ValueError: holidays must be a provided as a one-dimensional array
.
I don't understand the error because my list is a one-dimensional array. The output of list({k: v for k, v in holidays.US(years=x['PredictionTargetDateBOM'].year).items() if v in my_set})
is:
[datetime.date(2022, 1, 1),
datetime.date(2022, 1, 17),
datetime.date(2022, 5, 30),
datetime.date(2022, 7, 4),
datetime.date(2022, 9, 5),
datetime.date(2022, 11, 24),
datetime.date(2022, 12, 25),
datetime.date(2022, 12, 26)]
Which is the same output format-wise as:
holidays=[list(holidays.US(years=x['PredictionTargetDateBOM'].year).keys())