Replace NaN in pandas DataFrame at certain dates (upsampling)

Question

I'm new to python and I'm struggeling with the following example: I have a pandas DataFrame with a dateTime-Index and a column with feastdays. This is in daily resolution.

import pandas as pd
import holidays

hd = holidays.Switzerland(years=[2018])
f = pd.DataFrame(hd.items())
f.columns = ['date', 'feastday']
f['date'] = pd.to_datetime(f['date'])
f = f.set_index('date')

This looks like this:

date                feastday        
2018-01-01      Neujahrestag
2018-04-01            Ostern
2018-03-30        Karfreitag
2018-04-02       Ostermontag
2018-05-10          Auffahrt
2018-05-20         Pfingsten
2018-05-21     Pfingstmontag
2018-08-01  Nationalfeiertag
2018-12-25       Weihnachten

Now I want the data not in daily resolution but in for example 6H resolution:

f1 = f.resample('6H').asfreq()

That works as I wished and leads to:

date                     feastday        
2018-01-01 00:00:00  Neujahrestag
2018-01-01 06:00:00           NaN
2018-01-01 12:00:00           NaN
2018-01-01 18:00:00           NaN
2018-01-02 00:00:00           NaN
2018-01-02 06:00:00           NaN
2018-01-02 12:00:00           NaN

But now I want to fill for example 'Neujahrstag' for all 2018-01-01 and not only for the first item. The result should look like this (not only for 'Neujahrstag' but for all items in mit DataFrame f). All items with the same date should have the same value in feastday. The time at that date doesn't matter:

 date                     feastday        
2018-01-01 00:00:00  Neujahrestag
2018-01-01 06:00:00  Neujahrestag
2018-01-01 12:00:00  Neujahrestag
2018-01-01 18:00:00  Neujahrestag
2018-01-02 00:00:00           NaN
2018-01-02 06:00:00           NaN
2018-01-02 12:00:00           NaN

I can replace one item manually by:

f1['2018-01-01'] = f1['2018-01-01']['feastday'][0]

That works without problem but I don't get running the stuff automatically for all data... I tried it with a for-loop but I didn't succeed. Can anybody help me. Maybe there is also an other (simpler) way to reach my goal? Thanks in advance for your help.

Marco

score 2 · Answer 1 · answered Mar 08 '19 at 18:05

2

Grouping by day with the df.groupby(df.index.day) pattern is one way to do this:

f1 = f.resample('6H').asfreq()
res = f1.groupby(f1.index.day).ffill()[['feastday']]
res.head(7)
                         feastday
date
2018-01-01 00:00:00  Neujahrestag
2018-01-01 06:00:00  Neujahrestag
2018-01-01 12:00:00  Neujahrestag
2018-01-01 18:00:00  Neujahrestag
2018-01-02 00:00:00           NaN
2018-01-02 06:00:00           NaN
2018-01-02 12:00:00           NaN

answered Mar 08 '19 at 18:05

Peter Leimbigler

10,775
1
23
37

Thanks for this idea Peter. I only had to change from df.index.day to df.index.date then it works fine. – M. Schmid Mar 08 '19 at 22:13

score 1 · Accepted Answer · answered Mar 08 '19 at 18:06

In this particular case, use .ffill with the limit argument, as your frequency is 6 hours and there are 24 hours in a day.

df.resample('6H').ffill(limit=3)

#                         feastday
#date                             
#2018-01-01 00:00:00  Neujahrestag
#2018-01-01 06:00:00  Neujahrestag
#2018-01-01 12:00:00  Neujahrestag
#2018-01-01 18:00:00  Neujahrestag
#2018-01-02 00:00:00           NaN
#2018-01-02 06:00:00           NaN
#2018-01-02 12:00:00           NaN
#...

In general, coud groupby transform if things do not evenly divide.

df = df.resample('6H').asfreq()
df.groupby(df.index.date).transform('first')

Thanks a lot, that was exactly what I needed. Works fine. – M. Schmid Mar 08 '19 at 22:11 — M. Schmid, Mar 08 '19 at 22:11

Replace NaN in pandas DataFrame at certain dates (upsampling)

2 Answers2