Hey I have seen a lot of issues answered where the max and min date is constant accross the output. But what about filling in dates for each ID where you only what to fill between each IDs max and min date. Say for example this is the dataframe
x = pandas.DataFrame({'user': ['a','a','b','b','a'], 'dt': ['2016-01-01','2016-01-02', '2016-01-05','2016-01-09','2016-01-06'], 'val': [1,33,2,1,2]})
The desired output is
date | user | val |
---|---|---|
2016-01-01 | a | 1.0 |
2016-01-02 | a | 33.0 |
2016-01-03 | a | 0.0 |
2016-01-04 | a | 0.0 |
2016-01-05 | a | 0.0 |
2016-01-06 | a | 2.0 |
2016-01-05 | b | 2.0 |
2016-01-06 | b | 0.0 |
2016-01-07 | b | 0.0 |
2016-01-08 | b | 0.0 |
2016-01-09 | b | 1.0 |
Currently using this code
udates=x['dt'].unique()
x['dt'] = pandas.to_datetime(x['dt'])
filled_df = (x.set_index('dt')
.groupby('user')
.apply(lambda d: d.reindex(pd.date_range(min(x.dt),
max(x.dt),
freq='D')))
.drop('user', axis=1)
.reset_index('user')
.fillna(0))
Current output backfills dates for B before its min date
date | user | val |
---|---|---|
2016-01-01 | a | 1.0 |
2016-01-02 | a | 33.0 |
2016-01-03 | a | 0.0 |
2016-01-04 | a | 0.0 |
2016-01-05 | a | 0.0 |
2016-01-06 | a | 2.0 |
2016-01-01 | b | 0.0 |
2016-01-02 | b | 0.0 |
2016-01-03 | b | 0.0 |
2016-01-04 | b | 0.0 |
2016-01-05 | b | 2.0 |
2016-01-06 | b | 0.0 |
2016-01-07 | b | 0.0 |
2016-01-08 | b | 0.0 |
2016-01-09 | b | 1.0 |
Help is very appreciated!!!