0

I generated a new column displaying the 7th business day of the year and month from this df:

      YearOfSRC MonthNumberOfSRC    
0       2022             3    
1       2022             4          
2       2022             5             
3       2022             6            
4       2021             4   
... ... ... ...
20528   2022             1             
20529   2022             2             
20530   2022             3            
20531   2022             4             
20532   2022             5             

With this code:

df['PredictionDate'] = (pd
 .to_datetime(df[['YearOfSRC', 'MonthNumberOfSRC']]
                .set_axis(['year' ,'month'], axis=1)
                .assign(day=1)
              )
 .sub(pd.offsets.BusinessDay(1))
 .add(pd.offsets.BusinessDay(7))
)

To output this dataframe (df_final) with the new column, PredictionDate:

       YearOfSRC    MonthNumberOfSRC    PredictionDate
0       2022             3              2022-03-09
1       2022             4              2022-04-11
2       2022             5              2022-05-10
3       2022             6              2022-06-09
4       2021             4              2021-04-09
... ... ... ...
20528   2022             1              2022-01-11
20529   2022             2              2022-02-09
20530   2022             3              2022-03-09
20531   2022             4              2022-04-11
20532   2022             5              2022-05-10

(More details here)

However, I would like to make use of CustomBusinessDay and Python's Holiday package to modify the rows of PredictionDate where a holiday in the first week would push back the 7th business day by 1 business day. I know that CustomBusinessDay has a parameter for a holiday list so in a modular solution I would assign the list from the holiday library to that parameter. I know I could hard-code the added business day by increasing the day by 1 for all months where there is a holiday in the first week, but I would prefer a solution that is more dynamic. I have tried this instead of the above code but I get a KeyError:

df_final['PredictionDate'] = (pd
 .to_datetime(df_final[['YearOfSRC', 'MonthNumberOfSRC']]
                .set_axis(['year' ,'month'], axis=1)
                .assign(day=1)
              )
 .sub(df_final.apply(lambda x : pd.offsets.CustomBusinessDay(1, holidays = holidays.US(years = x['YearOfSRC']).keys())))
 .add(df_final.apply(lambda x: pd.offsets.CustomBusinessDay(7, holidays = holidays.US(years = x['YearOfSRC']).keys())))
)

KeyError: 'YearOfSRC'

I'm sure I am implementing pandas apply and lambda functions incorrectly here, but I don't know why the error would be a key error when that's clearly a column in df_final.

halfer
  • 19,824
  • 17
  • 99
  • 186
Hefe
  • 421
  • 3
  • 23
  • 1
    You probably just need to pass `axis=1` to your call to `apply()`. By default, `apply` runs over each column, so the Index of each Series being passed to the function is the Index of your dataframe, which is likely some sort of integer range. Passing `axis=1` passes each row, where the Index is your columns, and then you can access the columns by name. – whege Aug 11 '22 at 21:12

1 Answers1

1

Per my comment above, try this:

df_final['PredictionDate'] = (pd
 .to_datetime(df_final[['YearOfSRC', 'MonthNumberOfSRC']]
                .set_axis(['year' ,'month'], axis=1)
                .assign(day=1)
              )
 .sub(df_final.apply(lambda x : pd.offsets.CustomBusinessDay(1, holidays = holidays.US(year = x['YearOfSRC']).items()), axis=1))
 .add(df_final.apply(lambda x: pd.offsets.CustomBusinessDay(7, holidays = holidays.US(year = x['YearOfSRC']).items()), axis=1))
)
whege
  • 1,391
  • 1
  • 5
  • 13
  • Wow, simple solution. I will try it out tomorrow. If it works you just got the easiest 100 points of your life. Have you tested it out yourself? Seems like a lot going on within the `.sub` and `.add` calls to actually pull it off. This was my first try and honestly just a guess, so if I got it correct while only missing the `axis=1` part, I would be pretty pleased with myself. – Hefe Aug 12 '22 at 00:45
  • Yes that did work. Had to change `.items()` to `.keys()` in the `CustomBusinessDay` call as well as `year` to `years` in the `holidays.US()` call, but all in all, your's was the solution. High five. – Hefe Aug 12 '22 at 01:03
  • I do get this little warning but it still works: `C:\Users\JWeinstein\Anaconda3\lib\site-packages\pandas\core\arrays\datetimelike.py:1189: PerformanceWarning: Adding/subtracting object-dtype array to DatetimeArray not vectorized. warnings.warn(` I think this is because the `pd.to_datetime()` call for `YearOfSRC` and `MonthNumberOfSRC` is contained and doesn't hold over through the `apply.` Any tips? – Hefe Aug 12 '22 at 01:12
  • 1
    @Julien apologies for the late response. Glad to hear it worked! I didn't try this example specifically, but I have been dealing with this situation in Pandas myself recently. I'm not sure why that warning is appearing, though my approach would be to break the whole line into multiple steps and validate that each one is working as intended. – whege Aug 15 '22 at 20:57