How to add the sequential numbers if certain condition met in Pandas

Question

Need to increase the sequential numbers if certain condition is met or else need to hold the previous number.

Original_dataset:

ID	Name	Status	Cluster	Gap
1	A	0	1	15
1	B	1	1	35
1	C	1	1	03
2	B	0	1	26
2	C	0	1	16
3	A	1	1	65
3	C	0	1	89
3	F	0	1	19

Required_Dataset:

ID	Name	Status	Cluster	Gap
1	A	0	1	15
1	B	1	2	35
1	C	1	3	03
2	B	0	1	26
2	C	0	1	16
3	A	1	1	65
3	C	0	2	89
3	F	0	2	19

Conditions:

For first occurrence of ID, the cluster should be 1.
If status = 1 or Gap > 28, then cluster needs to increase by 1 based on patient ID (see row 1-C and 2-B --- as ID changes, the Cluster remains 1 as it is the first occurrence of the particular ID).
If the condition is not satisfied, it needs to hold the previous cluster number. (Can refer the final row).

The code which I have tried is:

Original_dataset.loc[((new_df4['gap'] > 28) | (Original_dataset['status'] == 1)),'Cluster'] = Original_dataset['Cluster'] + 1

score 2 · Answer 1 · answered Mar 08 '23 at 07:03

First set 1 if greater GAP like 28 or if Status is 1 or first duplicated value of ID in DataFrame.loc, then use lambda function with GroupBy.cumsum and GroupBy.ffill

m = df['Gap'].gt(28)
m1 = df['Status'].eq(1)
m2 = ~df['ID'].duplicated()

df.loc[m | m1 | m2, 'Cluster1'] = 1

f = lambda x: x.cumsum().ffill(downcast='int')
df['Cluster1'] = df.groupby('ID')['Cluster1'].transform(f)
print (df)
   ID Name  Status  Cluster  Gap  Cluster1
0   1    A       0        1   15         1
1   1    B       1        1   35         2
2   1    C       1        1    3         3
3   2    B       0        1   26         1
4   2    C       0        1   16         1
5   3    A       1        1   65         1
6   3    C       0        1   89         2
7   3    F       0        1   19         2

score 0 · Answer 2 · answered Mar 08 '23 at 07:53

You can use a mask based on 3 conditions, and a simple groupby.cumsum:

# is Gap≥28 OR Status==1 OR first ID ?
m = df['Gap'].ge(28) | df['Status'].eq(1) | ~df['ID'].duplicated()

# then increment count per group
df['Cluster'] = m.groupby(df['ID']).cumsum()

Output:

   ID Name  Status  Cluster  Gap
0   1    A       0        1   15
1   1    B       1        2   35
2   1    C       1        3    3
3   2    B       0        1   26
4   2    C       0        1   16
5   3    A       1        1   65
6   3    C       0        2   89
7   3    F       0        2   19

How to add the sequential numbers if certain condition met in Pandas

2 Answers2