1

Hey I have seen a lot of issues answered where the max and min date is constant accross the output. But what about filling in dates for each ID where you only what to fill between each IDs max and min date. Say for example this is the dataframe

x = pandas.DataFrame({'user': ['a','a','b','b','a'], 'dt': ['2016-01-01','2016-01-02', '2016-01-05','2016-01-09','2016-01-06'], 'val': [1,33,2,1,2]})

The desired output is

date user val
2016-01-01 a 1.0
2016-01-02 a 33.0
2016-01-03 a 0.0
2016-01-04 a 0.0
2016-01-05 a 0.0
2016-01-06 a 2.0
2016-01-05 b 2.0
2016-01-06 b 0.0
2016-01-07 b 0.0
2016-01-08 b 0.0
2016-01-09 b 1.0

Currently using this code

udates=x['dt'].unique()
x['dt'] = pandas.to_datetime(x['dt'])


filled_df = (x.set_index('dt')
         .groupby('user')
         .apply(lambda d: d.reindex(pd.date_range(min(x.dt),
                                                  max(x.dt),
                                                  freq='D')))
         .drop('user', axis=1)
         .reset_index('user')
         .fillna(0)) 

Current output backfills dates for B before its min date

date user val
2016-01-01 a 1.0
2016-01-02 a 33.0
2016-01-03 a 0.0
2016-01-04 a 0.0
2016-01-05 a 0.0
2016-01-06 a 2.0
2016-01-01 b 0.0
2016-01-02 b 0.0
2016-01-03 b 0.0
2016-01-04 b 0.0
2016-01-05 b 2.0
2016-01-06 b 0.0
2016-01-07 b 0.0
2016-01-08 b 0.0
2016-01-09 b 1.0

Help is very appreciated!!!

1 Answers1

1

Building on your solution, I just replaced min(x.dt) with min(d.index):

import pandas as pd
x = pd.DataFrame({'user': ['a','a','b','b','a'], 'dt': ['2016-01-01','2016-01-02', '2016-01-05','2016-01-09','2016-01-06'], 'val': [1,33,2,1,2]})
x['dt'] = pd.to_datetime(x['dt'])


filled_df = (x.set_index('dt')
         .groupby('user')
         .apply(lambda d: d.reindex(pd.date_range(min(d.index),
                                                  max(x.dt),
                                                  freq='D')))
         .drop('user', axis=1)
         .reset_index('user')
         .fillna(0))

output

>>> filled_df
           user   val
2016-01-01    a   1.0
2016-01-02    a  33.0
2016-01-03    a   0.0
2016-01-04    a   0.0
2016-01-05    a   0.0
2016-01-06    a   2.0
2016-01-07    a   0.0
2016-01-08    a   0.0
2016-01-09    a   0.0
2016-01-05    b   2.0
2016-01-06    b   0.0
2016-01-07    b   0.0
2016-01-08    b   0.0
2016-01-09    b   1.0
zap
  • 568
  • 2
  • 13
  • min(x.dt) and max(x.dt) with min(d.index) and max(d.index) makes it work for a wider range of dates also. Thanks! – dataaltorabbit Jun 30 '21 at 19:04
  • Answer output does not match the desired output. I guess changing max(x.dt) to max(d.index) will do the job. – Telis May 03 '23 at 07:54