2

I want to fill numbers in column flag, based on the value in column KEY.

  • Instead of using cumcount() to fill incremental numbers, I want to fill same number for every two rows if the value in column KEY stays same.
  • If the value in column KEY changes, the number filled changes also.

Here is the example, df1 is what I want from df0.

df0 = pd.DataFrame({'KEY':['0','0','0','0','1','1','1','2','2','2','2','2','3','3','3','3','3','3','4','5','6']})

df1 = pd.DataFrame({'KEY':['0','0','0','0','1','1','1','2','2','2','2','2','3','3','3','3','3','3','4','5','6'],
                    'flag':['0','0','1','1','2','2','3','4','4','5','5','6','7','7','8','8','9','9','10','11','12']})
cottontail
  • 10,268
  • 18
  • 50
  • 51
franky
  • 45
  • 4
  • I edited your title to be more clear, you can reedit if you like. Also, don't term things 'python' when you really mean 'pandas' ; only 12% of SO python questions are about pandas. – smci Oct 26 '20 at 22:05
  • First thing, all your values are integers not strings, so can you edit your code to assign/convert them to integers? (or use `df1.apply(pd.to_numeric, axis=1)`) – smci Oct 26 '20 at 22:11
  • Next, can you please edit your explanation to clarify, in words? This one is way too data-dependent on `flag` and `KEY`; should we always assume `KEY` is monotonic increasing and only changes by +1? Sounds like you want a conditional `cumcount()` on successive two-row chunks of `KEY`, by default `flag` also increments +1 every two rows, with an extra +1 when `KEY` increases? – smci Oct 26 '20 at 22:16
  • @smci, thank you for your comments. My example here is a representation of my real data set. The KEY column is created by df['KEY'] = df.groupby([several feature columns]).ngroup(). The purpose is to differentiate rows into groups by the features. Then in each group, I try to make pairs of rows. – franky Oct 27 '20 at 04:54
  • @David Erickson, thank you for your answer. It works as I wish. I do the right things. – franky Oct 27 '20 at 14:27

1 Answers1

5

You want to get the cumcount and add one. Then use %2 to differentiate between odd or even rows. Then, take the cumulative sum and subtract 1 to start counting from zero.

You can use:

df0['flag'] = ((df0.groupby('KEY').cumcount() + 1) % 2).cumsum() - 1
df0
Out[1]: 
   KEY  flag
0    0      0
1    0      0
2    0      1
3    0      1
4    1      2
5    1      2
6    1      3
7    2      4
8    2      4
9    2      5
10   2      5
11   2      6
12   3      7
13   3      7
14   3      8
15   3      8
16   3      9
17   3      9
18   4     10
19   5     11
20   6     12
David Erickson
  • 16,433
  • 2
  • 19
  • 35