1

Let's say we have following dataframe:

|    |   iiD  | Suppressant               | Suppressant_New |
|---:|-------:|:--------------------------|-----------------|
|  0 |      0 | EI402                     | EI503           |
|  1 |      0 | EI503                     | EG-CA 422       |
|  2 |      0 | EG-CA 422                 | EG-CAX          |
|  6 |      0 | EG-CAX                    | None            |
|  7 |      1 | EH333                     | ET777           |
|  8 |      1 | ET777                     | EH422           |
|  8 |      1 | EI503                     | EG-CA 422       |
|  9 |      1 | EG-CA 422                 | None            |

Now I want to count the duplicate rows and add the count value as a seperate column preserving iiD. So result should look like this:

|    |   iiD  | Suppressant               | Suppressant_New |count |
|---:|-------:|:--------------------------|-----------------|------|
|  0 |      0 | EI402                     | EI503           | 1
|  1 |      0 | EI503                     | EG-CA 422       | 2
|  2 |      0 | EG-CA 422                 | EG-CAX          | 1
|  6 |      0 | EG-CAX                    | None            | 1
|  7 |      1 | EH333                     | ET777           | 1
|  8 |      1 | ET777                     | EH422           | 1
|  8 |      1 | EI503                     | EG-CA 422       | 2
|  9 |      1 | EG-CA 422                 | None            | 1

What is an efficient way to do this with Pandas?

Niflheim
  • 51
  • 5

0 Answers0