I am working on a problem to group rows by using cumulative sum of an attribute (after being ordered). But I am new to python and don't know how to process it. Please kindly advise. Any help is appreciated.
Here is my input, which is a panda dataframe I made. As you can see, neither the key nor the group is ordered.
key group v1 v2
1_A 1 22 4
1_A -1 10 11
1_B 2 15 9
1_B 6 15 2
1_A 2 33 43
1_A 5 50 22
1_A 3 5 122
1_B 1 30 8
1_A 4 1 2
For the data processing, I need to calculate cumulative sum of v1 order by group and it is for the rows with the same key. So I suppose I should order the table first. But I am not sure. Please suggest. If I need to order the table first, the new table is like this one shown below. Basically, I am putting rows with the same key together, and order these rows by group.
key group v1 v2
1_A -1 10 11
1_A 1 22 4
1_A 2 33 43
1_A 3 5 122
1_A 4 1 2
1_A 5 50 22
1_B 1 30 8
1_B 2 15 9
1_B 6 1 2
Here is my desired output. Mainly I need to do cumulative sum of v1 by order of group, once the cumulative sum reaches a threshold, say 30 here, the accumulation stops, and restart for the next row. This process keeps going until it reaches the last row of the same group. At the end, if the last bin(s) is smaller than 30, then combine them with the lower bin(s), which is shown in 1_B, where group 2 and 6 together makes only 16 (<30), so they need to be combined with group 1.
Note that the bin number can be different from what I have here. As long as it is giving the same bin number to the same group, it works. For example, you can totally replace 1,2,3 to A,B,C, or to 3,2,1, or to A100,B201,M434.
key group v1 v2 bin sum_v1 sum_v2
1_A -1 10 11 1 32 15
1_A 1 22 4 1 32 15
1_A 2 33 43 2 33 43
1_A 3 5 122 3 56 146
1_A 4 1 2 3 56 146
1_A 5 50 22 3 56 146
1_B 1 30 8 1 46 19
1_B 2 15 9 1 46 19
1_B 6 1 2 1 46 19
EDIT: Now I have a full solution posted below as an answer. Enjoy.