Conditional cumcount of values in second column

Question

I want to fill numbers in column flag, based on the value in column KEY.

Instead of using cumcount() to fill incremental numbers, I want to fill same number for every two rows if the value in column KEY stays same.
If the value in column KEY changes, the number filled changes also.

Here is the example, df1 is what I want from df0.

df0 = pd.DataFrame({'KEY':['0','0','0','0','1','1','1','2','2','2','2','2','3','3','3','3','3','3','4','5','6']})

df1 = pd.DataFrame({'KEY':['0','0','0','0','1','1','1','2','2','2','2','2','3','3','3','3','3','3','4','5','6'],
                    'flag':['0','0','1','1','2','2','3','4','4','5','5','6','7','7','8','8','9','9','10','11','12']})

I edited your title to be more clear, you can reedit if you like. Also, don't term things 'python' when you really mean 'pandas' ; only 12% of SO python questions are about pandas. — smci, Oct 26 '20 at 22:05
First thing, all your values are integers not strings, so can you edit your code to assign/convert them to integers? (or use `df1.apply(pd.to_numeric, axis=1)`) — smci, Oct 26 '20 at 22:11
Next, can you please edit your explanation to clarify, in words? This one is way too data-dependent on `flag` and `KEY`; should we always assume `KEY` is monotonic increasing and only changes by +1? Sounds like you want a conditional `cumcount()` on successive two-row chunks of `KEY`, by default `flag` also increments +1 every two rows, with an extra +1 when `KEY` increases? — smci, Oct 26 '20 at 22:16
@smci, thank you for your comments. My example here is a representation of my real data set. The KEY column is created by df['KEY'] = df.groupby([several feature columns]).ngroup(). The purpose is to differentiate rows into groups by the features. Then in each group, I try to make pairs of rows. — franky, Oct 27 '20 at 04:54
@David Erickson, thank you for your answer. It works as I wish. I do the right things. — franky, Oct 27 '20 at 14:27

score 5 · Accepted Answer · answered Oct 26 '20 at 21:50

You want to get the cumcount and add one. Then use %2 to differentiate between odd or even rows. Then, take the cumulative sum and subtract 1 to start counting from zero.

You can use:

df0['flag'] = ((df0.groupby('KEY').cumcount() + 1) % 2).cumsum() - 1
df0
Out[1]: 
   KEY  flag
0    0      0
1    0      0
2    0      1
3    0      1
4    1      2
5    1      2
6    1      3
7    2      4
8    2      4
9    2      5
10   2      5
11   2      6
12   3      7
13   3      7
14   3      8
15   3      8
16   3      9
17   3      9
18   4     10
19   5     11
20   6     12

Conditional cumcount of values in second column

1 Answers1