I am trying to create a column that increments on a state change. The increment would happen whether or not the state has been seen before.
| epoch | state | state_idx |
| 1 | open | 1 |
| 2 | open | 1 |
| 3 | closed | 2 |
| 4 | closed | 2 |
| 5 | open | 3 |
| 6 | open | 3 |
| 7 | open | 3 |
I want state_idx so that I can group by key on state_idx. When the data is grouped, it will be faster to process on a spark cluster.