-1

I wanted to create a new column, let say named "group id" on the basis of:

  1. compare the nth row with (n-1)th row.
  2. if both the records are equal then in a "group id", previous "group id" is copied
  3. If these records are not equal, then 1 should be added to "group id column".

I wanted to have the result in the following way:

The expected result

Column A Column B
6-Aug-10 0
30-Aug-11 1
31-Aug-11 2
31-Aug-11 2
6-Sep-12 3
30-Aug-13 4

Looking for the result, similar to this excel function =IF(T3=T2, U2, U2+1)

Sonakshi
  • 9
  • 2

2 Answers2

0

you can use ngroup:

df['Group ID']=df.groupby('DOB').ngroup()

#according to your example

df['Group ID']=df.groupby('Column A').ngroup()
Bushmaster
  • 4,196
  • 3
  • 8
  • 28
0

Use factorize - consecutive groups are not count separately like compare shifted values with Series.cumsum and subtract 1:

print (df)
    Column A  Column B
0   6-Aug-10         0
1  30-Aug-11         1
2  31-Aug-11         2
3  31-Aug-11         2
4   6-Sep-12         3
5  30-Aug-13         4
6  30-Aug-11         5 <- added data for see difference
7  31-Aug-11         6 <- added data for see difference

df['Group ID1'] = pd.factorize(df['Column A'])[0]
df['Group ID2'] = df['Column A'].ne(df['Column A'].shift()).cumsum().sub(1)
print (df)
    Column A  Column B  Group ID1  Group ID2
0   6-Aug-10         0          0          0
1  30-Aug-11         1          1          1
2  31-Aug-11         2          2          2
3  31-Aug-11         2          2          2
4   6-Sep-12         3          3          3
5  30-Aug-13         4          4          4
6  30-Aug-11         5          1          5
7  31-Aug-11         5          2          6
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252