I would like to mark the days in my timeseries (data from china) in an extra column as holiday(boolean true) and non holiday(boolean false).
I am new to this topic and at the moment I am trying to figure out the way how to approach this problem.
I have following days for 2020 as chinese official holidays:
As far as I know, there is no calendar out of the box for china, so I will have to creat a custom calandar as follow:
from pandas.tseries.holiday import Holiday,AbstractHolidayCalendar
class ChineseHolidays(AbstractHolidayCalendar):
rules = [Holiday('Chinese New Year', month=1, day=25),
'Question: How to add more than one day?',
etc,
...]
cal = ChineseHolidays()
The next steps would be to create the Holidays columns as follows:
holidays = cal.holidays(start=X['timestamp'].min(), end = X['timestamp'].max())
X.assign(Holidays=X['timestamp'].isin(cal.holidays()).astype(int))
My questions here are:
1) Is this in general a proper apporach?
2) How can I define in the line Holiday('Chinese New Year', month=1, day=25) that the days of start from 24th of january and end on 30th of January? Is there a way to define the days off instead of defining just one day?
Thanks for your help.
Best,
B.