-1

I wanted to assign the unique id based on the value from the column. For ex. i have a table like this:

df = pd.DataFrame({'A': [0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,0,0,0,0,1,1,1]}

Eventually I would like to have my output table looks like this:

A id
1 0 1
2 0 1
3 0 1
4 0 1
5 0 1
6 0 1
7 1 2
8 1 2
9 1 2
10 1 2
11 1 2
12 1 2
13 0 3
14 0 3
15 0 3
16 0 3
17 0 3
18 0 3
19 1 4
20 1 4
21 1 4
22 0 5
23 0 5
24 0 5
25 0 5
26 1 6
27 1 6
28 1 6

I tried data.groupby(['a'], sort=False).ngroup() + 1 but its not working as what I want. Any help and guidance will be appreciated! thanks!

Psidom
  • 209,562
  • 33
  • 339
  • 356
Ali
  • 113
  • 4

2 Answers2

0

diff + cumsum:

df['id'] = df.A.diff().ne(0).cumsum()

df
    A  id
0   0   1
1   0   1
2   0   1
3   0   1
4   0   1
5   0   1
6   0   1
7   1   2
8   1   2
9   1   2
10  1   2
11  1   2
12  1   2
13  0   3
14  0   3
15  0   3
16  0   3
17  0   3
18  0   3
19  1   4
20  1   4
21  1   4
22  0   5
23  0   5
24  0   5
25  0   5
26  1   6
27  1   6
28  1   6
Psidom
  • 209,562
  • 33
  • 339
  • 356
0
import pdrle
df["id"] = pdrle.get_id(df["A"]) + 1
df
#     A   id
# 0   0   1
# 1   0   1
# 2   0   1
# 3   0   1
# 4   0   1
# 5   0   1
# 6   0   1
# 7   1   2
# 8   1   2
# 9   1   2
# 10  1   2
# 11  1   2
# 12  1   2
# 13  0   3
# 14  0   3
# 15  0   3
# 16  0   3
# 17  0   3
# 18  0   3
# 19  1   4
# 20  1   4
# 21  1   4
# 22  0   5
# 23  0   5
# 24  0   5
# 25  0   5
# 26  1   6
# 27  1   6
# 28  1   6
d.b
  • 32,245
  • 6
  • 36
  • 77