4

I have the following dataframe and I am trying to label an entire block with a number which is based on how many similar blocks has been seen upto now based on class column. Consecutive class value is given the same number. If the same class block comes later, the number will be incremented. If some new class block comes, then it is initialized to 1.

df = DataFrame(zip(range(10,30), range(20)), columns = ['a','b'])
df['Class'] = [np.nan, np.nan, np.nan, np.nan, 'a', 'a', 'a', 'a', np.nan, np.nan,'a',  'a',  'a', 'a', 'a', np.nan, np.nan, 'b', 'b','b']

     a   b Class
0   10   0   NaN
1   11   1   NaN
2   12   2   NaN
3   13   3   NaN
4   14   4     a
5   15   5     a
6   16   6     a
7   17   7     a
8   18   8   NaN
9   19   9   NaN
10  20  10     a
11  21  11     a
12  22  12     a
13  23  13     a
14  24  14     a
15  25  15   NaN
16  26  16   NaN
17  27  17     b
18  28  18     b
19  29  19     b

Sample output looks like this:

    a   b   Class   block_encounter_no
0   10  0   NaN NaN
1   11  1   NaN NaN
2   12  2   NaN NaN
3   13  3   NaN NaN
4   14  4   a   1
5   15  5   a   1
6   16  6   a   1
7   17  7   a   1
8   18  8   NaN NaN
9   19  9   NaN NaN
10  20  10  a   2
11  21  11  a   2
12  22  12  a   2
13  23  13  a   2
14  24  14  a   2
15  25  15  NaN NaN
16  26  16  NaN NaN
17  27  17  b   1
18  28  18  b   1
19  29  19  b   1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
learner
  • 2,582
  • 9
  • 43
  • 54

2 Answers2

4

Solution with mask:

df['block_encounter_no'] = (df.Class != df.Class.shift()).mask(df.Class.isnull())
                              .groupby(df.Class).cumsum()
print (df)
     a   b Class  block_encounter_no
0   10   0   NaN                 NaN
1   11   1   NaN                 NaN
2   12   2   NaN                 NaN
3   13   3   NaN                 NaN
4   14   4     a                 1.0
5   15   5     a                 1.0
6   16   6     a                 1.0
7   17   7     a                 1.0
8   18   8   NaN                 NaN
9   19   9   NaN                 NaN
10  20  10     a                 2.0
11  21  11     a                 2.0
12  22  12     a                 2.0
13  23  13     a                 2.0
14  24  14     a                 2.0
15  25  15   NaN                 NaN
16  26  16   NaN                 NaN
17  27  17     b                 1.0
18  28  18     b                 1.0
19  29  19     b                 1.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Do this:

df['block_encounter_no'] = \
    np.where(df.Class.notnull(),
             (df.Class.notnull() & (df.Class != df.Class.shift())).cumsum(),
             np.nan)

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624