I have a simple time series where a switch is turned on and off by an operator. My aim is to label each of the "turned on" phases with a different ID, e.g., the result with column eventID would look like this:
val eventDF = sc.parallelize(List(("2016-05-01 10:00:00", 0, 0),
("2016-05-01 10:00:30", 0, 0),
("2016-05-01 10:01:00", 1, 1),
("2016-05-01 10:01:20", 1, 1),
("2016-05-01 10:02:10", 1, 1),
("2016-05-01 10:03:30", 0, 0),
("2016-05-01 10:04:00", 0, 0),
("2016-05-01 10:05:20", 0, 0),
("2016-05-01 10:06:10", 1, 2),
("2016-05-01 10:06:30", 1, 2),
("2016-05-01 10:07:00", 1, 2),
("2016-05-01 10:07:20", 0, 0),
("2016-05-01 10:08:10", 0, 0),
("2016-05-01 10:08:50", 0, 0)))
.toDF("timestamp", "switch", "eventID")
So far, I tried the rank/rangeBetween/lag window functions without any luck...therefore, any hint is appreciated.