I have a pyspark dataframe with 3 columns:
ID, each appearing multiple times; DATE; DELAY, 0 if this bill was payed on time, 1 otherwise.
It's already ordered by ID and DATE. I need to create a column named CONSECUTIVE that shows how many consecutive bills were paid consecutively with DELAY=1 for each ID.
Example of data, and expected result:
ID | DATE | DELAY | CONSECUTIVE
101 | 1 | 1 | 1
101 | 2 | 1 | 2
101 | 3 | 1 | 3
101 | 4 | 0 | 0
101 | 5 | 1 | 1
101 | 6 | 1 | 2
213 | 1 | 1 | 1
213 | 2 | 1 | 2
Is there a way to do it without using Pandas? If so, how do I do it?