Python Fill in missing dates for grouped IDs, where each ID has it's own min and max date

Question

Hey I have seen a lot of issues answered where the max and min date is constant accross the output. But what about filling in dates for each ID where you only what to fill between each IDs max and min date. Say for example this is the dataframe

x = pandas.DataFrame({'user': ['a','a','b','b','a'], 'dt': ['2016-01-01','2016-01-02', '2016-01-05','2016-01-09','2016-01-06'], 'val': [1,33,2,1,2]})

The desired output is

date	user	val
2016-01-01	a	1.0
2016-01-02	a	33.0
2016-01-03	a	0.0
2016-01-04	a	0.0
2016-01-05	a	0.0
2016-01-06	a	2.0
2016-01-05	b	2.0
2016-01-06	b	0.0
2016-01-07	b	0.0
2016-01-08	b	0.0
2016-01-09	b	1.0

Currently using this code

udates=x['dt'].unique()
x['dt'] = pandas.to_datetime(x['dt'])


filled_df = (x.set_index('dt')
         .groupby('user')
         .apply(lambda d: d.reindex(pd.date_range(min(x.dt),
                                                  max(x.dt),
                                                  freq='D')))
         .drop('user', axis=1)
         .reset_index('user')
         .fillna(0))

Current output backfills dates for B before its min date

date	user	val
2016-01-01	a	1.0
2016-01-02	a	33.0
2016-01-03	a	0.0
2016-01-04	a	0.0
2016-01-05	a	0.0
2016-01-06	a	2.0
2016-01-01	b	0.0
2016-01-02	b	0.0
2016-01-03	b	0.0
2016-01-04	b	0.0
2016-01-05	b	2.0
2016-01-06	b	0.0
2016-01-07	b	0.0
2016-01-08	b	0.0
2016-01-09	b	1.0

Help is very appreciated!!!

score 1 · Accepted Answer · answered Jun 30 '21 at 18:15

Building on your solution, I just replaced min(x.dt) with min(d.index):

import pandas as pd
x = pd.DataFrame({'user': ['a','a','b','b','a'], 'dt': ['2016-01-01','2016-01-02', '2016-01-05','2016-01-09','2016-01-06'], 'val': [1,33,2,1,2]})
x['dt'] = pd.to_datetime(x['dt'])


filled_df = (x.set_index('dt')
         .groupby('user')
         .apply(lambda d: d.reindex(pd.date_range(min(d.index),
                                                  max(x.dt),
                                                  freq='D')))
         .drop('user', axis=1)
         .reset_index('user')
         .fillna(0))

output

>>> filled_df
           user   val
2016-01-01    a   1.0
2016-01-02    a  33.0
2016-01-03    a   0.0
2016-01-04    a   0.0
2016-01-05    a   0.0
2016-01-06    a   2.0
2016-01-07    a   0.0
2016-01-08    a   0.0
2016-01-09    a   0.0
2016-01-05    b   2.0
2016-01-06    b   0.0
2016-01-07    b   0.0
2016-01-08    b   0.0
2016-01-09    b   1.0

min(x.dt) and max(x.dt) with min(d.index) and max(d.index) makes it work for a wider range of dates also. Thanks! — dataaltorabbit, Jun 30 '21 at 19:04
Answer output does not match the desired output. I guess changing max(x.dt) to max(d.index) will do the job. — Telis, May 03 '23 at 07:54

Python Fill in missing dates for grouped IDs, where each ID has it's own min and max date

1 Answers1

Linked