Fill in missing dates and add lists together from column by row and front fill in pandas dataframe

Question

I'd like to fill in the following dataframe with dates between a range and front fill all the columns. As I am completing this, I would like to append the lists in the Wells column so it becomes continues to add items by date.

Dataframe to expand by date

   StartDate     Wells                     count Sum_Cumm   vol
0   1967-10-01  [MUN-523, MUN-354, MUN-2660] 50   50    8503.323620
1   1968-01-01  [MUN-152]                     1   51    8336.591784
2   1968-03-01  [MUN-1032]                    1   52    8176.272712
3   1968-10-01  [MUN-16128]                   1   53    9191.110200

Code that I'm working on

newdf = (newdf.set_index('StartDate').reindex(pd.date_range('10-01-1967', '12-31-1994', freq='MS')).rename_axis(['StartDate']).reset_index()).ffill(newdf['vol'])

Dataframe that I'd like to end up with

 StartDate   Wells                                        count Sum_Cumm    vol
0   1967-10-01  [MUN-523, MUN-354, MUN-2660]                     50   50    8503.323620
1   1967-11-01  [MUN-523, MUN-354, MUN-2660]                     1    51    8503.323620    
2   1967-12-01  [MUN-523, MUN-354, MUN-2660]                     1    51    8503.323620
3   1968-01-01  [MUN-523, MUN-354, MUN-2660,MUN-152]             1    52    8336.591784
4   1968-02-01  [MUN-523, MUN-354, MUN-2660,MUN-152]             1    53    8336.591784
5   1968-03-01  [MUN-523, MUN-354, MUN-2660,MUN-152,MUN-1032]    1    53    8176.272712
6   1968-04-01  [MUN-523, MUN-354, MUN-2660,MUN-152,MUN-1032]    1    53    8176.272712

This looks like a good use-case for [Resample](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html) — G. Anderson, Feb 27 '23 at 23:21

SomeDude · Accepted Answer · 2023-02-28T01:02:08.160

You can use period_range to create a new index and reindex the existing df to create a new dataframe and then ffill.

For Wells column do a cumsum and then apply np.unique

new_idx = pd.to_datetime(
    pd.period_range(df["StartDate"].min(), df["StartDate"].max(), freq="M")
    .asfreq("D", how="S")
    .strftime("%Y-%m-%d")
)
df2 = df.set_index("StartDate").reindex(new_idx)
df2 = df2.ffill(downcast="infer")
df2["Wells"] = df2["Wells"].cumsum().apply(np.unique)
df2 = df2.rename_axis("StartDate").reset_index()

print(df2)

    StartDate                                              Wells  count  \
0  1967-10-01                       [MUN-2660, MUN-354, MUN-523]     50   
1  1967-11-01                       [MUN-2660, MUN-354, MUN-523]     50   
2  1967-12-01                       [MUN-2660, MUN-354, MUN-523]     50   
3  1968-01-01              [MUN-152, MUN-2660, MUN-354, MUN-523]      1   
4  1968-02-01              [MUN-152, MUN-2660, MUN-354, MUN-523]      1   
5  1968-03-01    [MUN-1032, MUN-152, MUN-2660, MUN-354, MUN-523]      1   
6  1968-04-01    [MUN-1032, MUN-152, MUN-2660, MUN-354, MUN-523]      1   
7  1968-05-01    [MUN-1032, MUN-152, MUN-2660, MUN-354, MUN-523]      1   
8  1968-06-01    [MUN-1032, MUN-152, MUN-2660, MUN-354, MUN-523]      1   
9  1968-07-01    [MUN-1032, MUN-152, MUN-2660, MUN-354, MUN-523]      1   
10 1968-08-01    [MUN-1032, MUN-152, MUN-2660, MUN-354, MUN-523]      1   
11 1968-09-01    [MUN-1032, MUN-152, MUN-2660, MUN-354, MUN-523]      1   
12 1968-10-01  [MUN-1032, MUN-152, MUN-16128, MUN-2660, MUN-3...      1   

    Sum_Cumm          vol  
0         50  8503.323620  
1         50  8503.323620  
2         50  8503.323620  
3         51  8336.591784  
4         51  8336.591784  
5         52  8176.272712  
6         52  8176.272712  
7         52  8176.272712  
8         52  8176.272712  
9         52  8176.272712  
10        52  8176.272712  
11        52  8176.272712  
12        53  9191.110200

Fill in missing dates and add lists together from column by row and front fill in pandas dataframe

1 Answers1