Need to increase the sequential numbers if certain condition is met or else need to hold the previous number.
Original_dataset:
ID | Name | Status | Cluster | Gap |
---|---|---|---|---|
1 | A | 0 | 1 | 15 |
1 | B | 1 | 1 | 35 |
1 | C | 1 | 1 | 03 |
2 | B | 0 | 1 | 26 |
2 | C | 0 | 1 | 16 |
3 | A | 1 | 1 | 65 |
3 | C | 0 | 1 | 89 |
3 | F | 0 | 1 | 19 |
Required_Dataset:
ID | Name | Status | Cluster | Gap |
---|---|---|---|---|
1 | A | 0 | 1 | 15 |
1 | B | 1 | 2 | 35 |
1 | C | 1 | 3 | 03 |
2 | B | 0 | 1 | 26 |
2 | C | 0 | 1 | 16 |
3 | A | 1 | 1 | 65 |
3 | C | 0 | 2 | 89 |
3 | F | 0 | 2 | 19 |
Conditions:
- For first occurrence of ID, the cluster should be 1.
- If status = 1 or Gap > 28, then cluster needs to increase by 1 based on patient ID (see row 1-C and 2-B --- as ID changes, the Cluster remains 1 as it is the first occurrence of the particular ID).
- If the condition is not satisfied, it needs to hold the previous cluster number. (Can refer the final row).
The code which I have tried is:
Original_dataset.loc[((new_df4['gap'] > 28) | (Original_dataset['status'] == 1)),'Cluster'] = Original_dataset['Cluster'] + 1