2

I have a sequential df, based on a timestamp, that I would like to split when the distance value is greater than 1000.

The df looks like this:

+-----------------+-------------------+---+ |timestamp |distance |id | +-----------------+-------------------+---+ |1541712752000 |1.1990470282994594 |123| |1541713551000 |1.5804709872862326 |123| |1541714462000 |0.0 |123| |1541715475000 |0.53107795768697 |123| |1541716383000 |0.53107795768697 |123| |1541716792000 |0.24740321078091282|123| |1541717695000 |1542.00 |123| |1541717801000 |2.7767418047706816 |123| |1541718779000 |13.058715260118664 |123| |1541719672000 |22.64146251404579 |123| |1541720581000 |23.861007122654314 |123| |1541721502000 |16.327504368653443 |123| |1541722572000 |26.084599108380274 |123| |1541723500000 |20.630034360787512 |123| |1541724219000 |1893.00 |123| |1541725264000 |23.16455204686255 |123| |1541726037000 |15.911555304774817 |123| |1541726950000 |20.057274313740784 |123| |1541727884000 |12.967418789242549 |123| |1541728085000 |2.720850595301784 |123| +-----------------+-------------------+---+

Based on the splitting sequential df on distance greater than 1000 I would like to have three dfs that look like:

+-----------------+-------------------+---+ |timestamp |distance |id | +-----------------+-------------------+---+ |1541712752000 |1.1990470282994594 |123| |1541713551000 |1.5804709872862326 |123| |1541714462000 |0.0 |123| |1541715475000 |0.53107795768697 |123| |1541716383000 |0.53107795768697 |123| |1541716792000 |0.24740321078091282|123| +-----------------+-------------------+---+

+-----------------+-------------------+---+ |timestamp |distance |id | +-----------------+-------------------+---+ |1541717695000 |1542.00 |123| |1541717801000 |2.7767418047706816 |123| |1541718779000 |13.058715260118664 |123| |1541719672000 |22.64146251404579 |123| |1541720581000 |23.861007122654314 |123| |1541721502000 |16.327504368653443 |123| |1541722572000 |26.084599108380274 |123| |1541723500000 |20.630034360787512 |123| +-----------------+-------------------+---+

+-----------------+-------------------+---+ |timestamp |distance |id | +-----------------+-------------------+---+ |1541724219000 |1893.00 |123| |1541725264000 |23.16455204686255 |123| |1541726037000 |15.911555304774817 |123| |1541726950000 |20.057274313740784 |123| |1541727884000 |12.967418789242549 |123| |1541728085000 |2.720850595301784 |123| +-----------------+-------------------+---+

I'm using Spark 2.0.0

Thanks

0 Answers0