Cumulative sum problem considering data till last record with multiple IDs

Question

I have a dataset with multiple IDs and dates where I have created a column for Cumulative supply in python.

My data is as follows

SKU Date    Demand  Supply  Cum_Supply
1   20160207    6   2       2
1   20160214    5   0       2
1   20160221    1   0       2
1   20160228    6   0       2
1   20160306    1   0       2
1   20160313    101 0       2
1   20160320    1   0       2
1   20160327    1   0       2
2   20160207    0   0       0
2   20160214    0   0       0
2   20160221    2   0       0
2   20160228    2   0       0
2   20160306    2   0       0
2   20160313    1   0       0
2   20160320    1   0       0
2   20160327    1   0       0

Where Cum_supply was calculated by

idx = pd.MultiIndex.from_product([np.unique(data.Date), data.SKU.unique()])

data2 = data.set_index(['Date', 'SKU']).reindex(idx).fillna(0)
data2 = pd.concat([data2, data2.groupby(level=1).cumsum().add_prefix('Cum_')],1).sort_index(level=1).reset_index()

I want to create a Column 'True_Demand' which is max unfulfilled demand till that date max(Demand-Supply) + Cum_supply.
So my output would be something this:

SKU Date        Demand  Supply  Cum_Supply  True_Demand
1   20160207    6          2        2       6
1   20160214    5          0        2       7
1   20160221    1          0        2       7
1   20160228    6          0        2       8
1   20160306    1          0        2       8
1   20160313    101        0        2       103
1   20160320    1          0        2       103
1   20160327    1          0        2       103
2   20160207    0          0        0       0
2   20160214    0          0        0       0
2   20160221    2          0        0       2
2   20160228    2          0        0       2
2   20160306    2          0        0       2
2   20160313    1          0        0       2
2   20160320    1          0        0       2
2   20160327    1          0        0       2

So for the 3rd record(20160221) the max unfulfilled demand before 20160221 was 5. So the True demand is 5+2 = 7 despite the unfulfilled demand on that date was 1+2.

Code for the dataframe

data = pd.DataFrame({'SKU':[1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],
'Date':[20160207,20160214,20160221,20160228,20160306,20160313,20160320,20160327,20160207,20160214,20160221,20160228,20160306,20160313,20160320,20160327],
'Demand':[6,5,1,6,1,101,1,1,0,0,2,2,2,1,1,1],
'Supply':[2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}
,columns=['Date', 'SKU', 'Demand', 'Supply'])

You have two separate, unrelated problems. Can you please focus on one problem at a time (one problem per question)? — cs95, Dec 26 '18 at 18:29

score 1 · Accepted Answer · answered Dec 26 '18 at 20:23

Would you try this pretty fun one-liner?

(data.groupby('SKU', 
              as_index=False, 
              group_keys=False)
     .apply(lambda x: 
            x.assign(Cum_Supply=x.Supply.cumsum())
             .pipe(lambda x: 
                   x.assign(True_Demand = (x.Demand - x.Supply + x.Cum_Supply).cummax()))))

Output:

        Date  SKU  Demand  Supply  Cum_Supply  True_Demand
0   20160207    1       6       2           2            6
1   20160214    1       5       0           2            7
2   20160221    1       1       0           2            7
3   20160228    1       6       0           2            8
4   20160306    1       1       0           2            8
5   20160313    1     101       0           2          103
6   20160320    1       1       0           2          103
7   20160327    1       1       0           2          103
8   20160207    2       0       0           0            0
9   20160214    2       0       0           0            0
10  20160221    2       2       0           0            2
11  20160228    2       2       0           0            2
12  20160306    2       2       0           0            2
13  20160313    2       1       0           0            2
14  20160320    2       1       0           0            2
15  20160327    2       1       0           0            2

Thank you. This did work. Any chance this could fit in this formula. `idx = pd.MultiIndex.from_product([np.unique(data.Date), data.SKU.unique()]) data2 = data.set_index(['Date', 'SKU']).reindex(idx).fillna(0) data2 = pd.concat([data2, data2.groupby(level=1).cumsum().add_prefix('Cum_')],1).sort_index(level=1).reset_index()` I tried it but my results are not the same. I might have to bring in a 3rd parameter for calculating the Cumulative values. (Account, ISBN and Date). — Neil S, Dec 26 '18 at 23:01

Cumulative sum problem considering data till last record with multiple IDs

1 Answers1