The question is in the title but how do you create a new column in Pyspark which counts cumulatively the number of previous repeating values?
For instance:
| Value|
| 0 |
| 0 |
| 5 |
| 5 |
| -1 |
| 0 |
| 0 |
| 0 |
Applying this to the value
column would result in a new column of values
| Value | Result
| 0 | 1
| 0 | 2
| 5 | 1
| 5 | 2
| -1 | 1
| 0 | 1
| 0 | 2
| 0 | 3