2

Given a DataFrame, I would like the group number of the values in one column id1, within each group of a second column id2.

I tried ngroup() to identify unique number groups by id1 and id2.

Here is an example df:

    id1  id2
0  1123  123
1  1123  123
2  1124  123
3  1124  123
4  1125  123
5  1125  123
6  1125  123
7  1126  122
8  1126  122
9  1127  122

Using ngroup():

df['row_id'] = df.groupby(['id1','id2']).ngroup() + 1

But it gave me this output:

row_id = [1, 1, 2, 2, 3, 3, 3, 4, 4, 5]

I would like the last 3 values to start again at 1, since they are for a new group of id2 (122); thus my desired output is:

row_id = [1, 1, 2, 2, 3, 3, 3, 1, 1, 2]
#                              ^ restart (id2 switches from 123 to 122)
Pierre D
  • 24,012
  • 7
  • 60
  • 96
shbrn
  • 33
  • 2

1 Answers1

1

Since it appears you would like a group number of id1 within each group of id2, then what you need to do is two .groupby() operations (a groupby within each group):

# reproducible (copy/paste) setup
df = pd.DataFrame({
    'id1': [1123, 1123, 1124, 1124, 1125, 1125, 1125, 1126, 1126, 1127],
    'id2': [123, 123, 123, 123, 123, 123, 123, 122, 122, 122]})

out = df.assign(
    row_id=df
    .groupby('id2', group_keys=False)
    .apply(lambda g: g.groupby('id1').ngroup()) + 1)

Gives:

>>> out
    id1  id2  row_id
0  1123  123       1
1  1123  123       1
2  1124  123       2
3  1124  123       2
4  1125  123       3
5  1125  123       3
6  1125  123       3
7  1126  122       1
8  1126  122       1
9  1127  122       2

Note: this handles the case where your df is unsorted:

>>> df = df.sample(frac=1)
>>> df
    id1  id2
8  1126  122
2  1124  123
6  1125  123
5  1125  123
7  1126  122
1  1123  123
3  1124  123
9  1127  122
4  1125  123
0  1123  123

and:

out = df.assign(
    row_id=df
    .groupby('id2', group_keys=False)
    .apply(lambda g: g.groupby('id1').ngroup()) + 1)

Gives:

>>> out
    id1  id2  row_id
8  1126  122       1
2  1124  123       1
6  1125  123       2
5  1125  123       2
7  1126  122       1
1  1123  123       3
3  1124  123       1
9  1127  122       2
4  1125  123       2
0  1123  123       3
Pierre D
  • 24,012
  • 7
  • 60
  • 96