1

I have a data frame with 'Date','countyFIPS','Topic_label' and 'Score columns'. I want to aggregate data to 3-day level for each county for each topic in 'Topic_label'. There are missing dates in Date columns. I need continuos date column by imputing missing dates. For each date, three days, including that day's rolling average score for each topic, need to be calculated. We need to gather three days of data for each topic and take average of scores.

df=pd.DataFrame({'countyFIPS':[1001,1001,1001,1001,1001,
           1002,1002,1002,1002,1002],
          'Date':['2020-08-16','2020-08-14','2020-08-14','2020-08-14',
           '2020-08-14','2020-08-05','2020-08-07','2020-08-07',
           '2020-08-05','2020-08-05'],
          'Topic_Label':['Health','Business','Health','Business',
           'Health','Politics','Health','Health','Politics','Business'],
          'Score':[1,1,2,1,3,4,2,1,3,4]},
        )

Desired output

df_desired=pd.DataFrame({'countyFIPS':[1001,1001,1001,1001,1001,1001,1001,1001,1001,
                        1002,1002,1002,1002,1002,1002,1002,1002,1002],
          'Date':['2020-08-14','2020-08-14','2020-08-14','2020-08-15','2020-08-15','2020-08-15',
                  '2020-08-16','2020-08-16','2020-08-16','2020-08-05','2020-08-05','2020-08-05',
                  '2020-08-06','2020-08-06','2020-08-06','2020-08-07','2020-08-07','2020-08-07'],
          'Topic_Label':['Health','Business','Politics','Health','Business','Politics',
                         'Health','Business','Politics','Health','Business','Politics',
                         'Health','Business','Politics','Health','Business','Politics'],
          '3 day rolling average':[2.5,1,0,2.5,1,0,2,1,0,0,4,3.5,0,4,3.5,1.5,4,3.5]},
        )

Thanks in advance!

NAS_2339
  • 353
  • 2
  • 13

0 Answers0