I have table1
number type time
--------------------
1 on 1
1 on 5
1 off 10
1 off 15
I need to transform it into table2
number type time
--------------------
1 on 1
1 off 2
1 on 5
1 off 10
1 on 14
1 off 15
The conditions are if $"type" === "ON" && $"lag_type" =!= "OFF"
I copy lagging row, but replace type with OFF and increment time by a second. If $"type" =!= "ON" && $"lag_type" === "OFF"
I copy current row, but replace type with ON and decrease time by a second. If lag type is null skip.
I found which rows should be copied
val window = Window.partitionBy($"number").orderBy($"time")
df
.withColumn("lag_type", lag($"type", 1, null).over(window))
.withColumn("lag1", $"type" === "ON" && $"lag_type" =!= "OFF")
.withColumn("lag2", $"type" =!= "ON" && $"lag_type" === "OFF")
But don't know how to add rows. Especially if they are based on the values from the lagging one. For the current one, I could probably make lists
type time
(on, off) (14, 15)
and explode them like it's show here, but again I'm lost when it comes to the lagging row.
Any advice? I'm using spark 2.2. Is it possible with case classes?