I read in a DataFrame with around 80.000 observations taken every 15 mins (= 96 data records per day). The objective is to create a new column which indicates (with 1 or 0) that the records are taken at a holiday.
df = pd.read_csv('data_prepared.csv', sep=',', index_col=10, parse_dates=True)
Out[49]:
datetime H1 H2 ... H7 H8 Hges
MEZ/MESZ ...
2018-04-01 00:15:00 2018-03-31 22:15:00 25.0 47.0 ... 47.0 45.0 241.0
2018-04-01 00:30:00 2018-03-31 22:30:00 42.0 45.0 ... 37.0 35.0 228.0
2018-04-01 00:45:00 2018-03-31 22:45:00 25.0 50.0 ... 42.0 35.0 205.0
2018-04-01 01:00:00 2018-03-31 23:00:00 34.0 68.0 ... 46.0 88.0 301.0
2018-04-01 01:15:00 2018-03-31 23:15:00 27.0 45.0 ... 44.0 168.0 352.0
...
In Pandas: Checking if a date is a holiday and assigning boolean value I saw how to assign a True/False value for every observation when it was taken at a holiday. So I tried to recreate this:
dr = df.index
df = pd.DataFrame()
df['Date'] = dr
I created a csv, where the Bavarian holidays are collected:
b_hol = pd.read_csv(r'Data/Dt_Feiertage.csv', encoding= 'unicode_escape', sep=';', index_col=0, parse_dates=True)
holidays = b_hol.index
holidays = holidays.strftime('%Y-%m-%d')
holidays[0:20]
Out[52]:
Index(['2018-01-01', '2018-01-06', '2018-03-30', '2018-04-02', '2018-05-01',
'2018-05-10', '2018-05-21', '2018-05-31', '2018-08-15', '2018-10-03',
'2018-11-01', '2018-12-25', '2018-12-26', '2019-01-01', '2019-01-06',
'2019-04-19', '2019-04-22', '2019-05-01', '2019-05-30', '2019-06-10'],
dtype='object', name='Datum')
With .isin I ask the DataFrame whether the observation was taken at a holiday:
df['Holiday'] = df['Date'].isin(holidays)
The code works fine so far, but my problem is, that e.g. for the '2018-04-02' I want that every observation is marked with True and not only the first timestamp, as you can see here:
df[90:100]
Out[56]:
Date Holiday
90 2018-04-01 22:45:00 False
91 2018-04-01 23:00:00 False
92 2018-04-01 23:15:00 False
93 2018-04-01 23:30:00 False
94 2018-04-01 23:45:00 False
95 2018-04-02 00:00:00 True
96 2018-04-02 00:15:00 False
97 2018-04-02 00:30:00 False
98 2018-04-02 00:45:00 False
99 2018-04-02 01:00:00 False
Instead I want to have the following:
Date Holiday
90 2018-04-01 22:45:00 False
91 2018-04-01 23:00:00 False
92 2018-04-01 23:15:00 False
93 2018-04-01 23:30:00 False
94 2018-04-01 23:45:00 False
95 2018-04-02 00:00:00 True
96 2018-04-02 00:15:00 True
97 2018-04-02 00:30:00 True
98 2018-04-02 00:45:00 True
99 2018-04-02 01:00:00 True