I have a sequential df, based on a timestamp, that I would like to split when the distance value is greater than 1000.
The df looks like this:
+-----------------+-------------------+---+
|timestamp |distance |id |
+-----------------+-------------------+---+
|1541712752000 |1.1990470282994594 |123|
|1541713551000 |1.5804709872862326 |123|
|1541714462000 |0.0 |123|
|1541715475000 |0.53107795768697 |123|
|1541716383000 |0.53107795768697 |123|
|1541716792000 |0.24740321078091282|123|
|1541717695000 |1542.00 |123|
|1541717801000 |2.7767418047706816 |123|
|1541718779000 |13.058715260118664 |123|
|1541719672000 |22.64146251404579 |123|
|1541720581000 |23.861007122654314 |123|
|1541721502000 |16.327504368653443 |123|
|1541722572000 |26.084599108380274 |123|
|1541723500000 |20.630034360787512 |123|
|1541724219000 |1893.00 |123|
|1541725264000 |23.16455204686255 |123|
|1541726037000 |15.911555304774817 |123|
|1541726950000 |20.057274313740784 |123|
|1541727884000 |12.967418789242549 |123|
|1541728085000 |2.720850595301784 |123|
+-----------------+-------------------+---+
Based on the splitting sequential df on distance greater than 1000 I would like to have three dfs that look like:
+-----------------+-------------------+---+
|timestamp |distance |id |
+-----------------+-------------------+---+
|1541712752000 |1.1990470282994594 |123|
|1541713551000 |1.5804709872862326 |123|
|1541714462000 |0.0 |123|
|1541715475000 |0.53107795768697 |123|
|1541716383000 |0.53107795768697 |123|
|1541716792000 |0.24740321078091282|123|
+-----------------+-------------------+---+
+-----------------+-------------------+---+
|timestamp |distance |id |
+-----------------+-------------------+---+
|1541717695000 |1542.00 |123|
|1541717801000 |2.7767418047706816 |123|
|1541718779000 |13.058715260118664 |123|
|1541719672000 |22.64146251404579 |123|
|1541720581000 |23.861007122654314 |123|
|1541721502000 |16.327504368653443 |123|
|1541722572000 |26.084599108380274 |123|
|1541723500000 |20.630034360787512 |123|
+-----------------+-------------------+---+
+-----------------+-------------------+---+
|timestamp |distance |id |
+-----------------+-------------------+---+
|1541724219000 |1893.00 |123|
|1541725264000 |23.16455204686255 |123|
|1541726037000 |15.911555304774817 |123|
|1541726950000 |20.057274313740784 |123|
|1541727884000 |12.967418789242549 |123|
|1541728085000 |2.720850595301784 |123|
+-----------------+-------------------+---+
I'm using Spark 2.0.0
Thanks