Create a new column pandas based on another column condition

Question

I wanted to create a new column, let say named "group id" on the basis of:

compare the nth row with (n-1)th row.
if both the records are equal then in a "group id", previous "group id" is copied
If these records are not equal, then 1 should be added to "group id column".

I wanted to have the result in the following way:

Column A	Column B
6-Aug-10	0
30-Aug-11	1
31-Aug-11	2
31-Aug-11	2
6-Sep-12	3
30-Aug-13	4

Looking for the result, similar to this excel function =IF(T3=T2, U2, U2+1)

score 0 · Answer 1 · answered Nov 15 '22 at 06:37

0

you can use ngroup:

df['Group ID']=df.groupby('DOB').ngroup()

#according to your example

df['Group ID']=df.groupby('Column A').ngroup()

answered Nov 15 '22 at 06:37

Bushmaster

4,196
3
8
28

Thanks a lot Clegane, the solution is working fine – Sonakshi Nov 15 '22 at 12:37

jezrael · Answer 2 · 2022-11-15T06:46:09.743

Use factorize - consecutive groups are not count separately like compare shifted values with Series.cumsum and subtract 1:

print (df)
    Column A  Column B
0   6-Aug-10         0
1  30-Aug-11         1
2  31-Aug-11         2
3  31-Aug-11         2
4   6-Sep-12         3
5  30-Aug-13         4
6  30-Aug-11         5 <- added data for see difference
7  31-Aug-11         6 <- added data for see difference

df['Group ID1'] = pd.factorize(df['Column A'])[0]
df['Group ID2'] = df['Column A'].ne(df['Column A'].shift()).cumsum().sub(1)
print (df)
    Column A  Column B  Group ID1  Group ID2
0   6-Aug-10         0          0          0
1  30-Aug-11         1          1          1
2  31-Aug-11         2          2          2
3  31-Aug-11         2          2          2
4   6-Sep-12         3          3          3
5  30-Aug-13         4          4          4
6  30-Aug-11         5          1          5
7  31-Aug-11         5          2          6

Create a new column pandas based on another column condition

2 Answers2