I have a pyspark dataframe that has starttime and stoptime columns with additional columns whose values get updated
|startime |stoptime |hour |minute |sec |sip |dip |sport|dport|proto|pkt |byt |
|1504766585|1504801216|16 |20 |16 |192.168.0.11 |23.204.108.58 |51249|80 |6 |0 |0 |
|1504766585|1504801216|16 |20 |16 |192.168.0.11 |23.204.108.58 |51249|80 |6 |0 |0 |
|1504781751|1504801216|16 |20 |16 |192.168.0.11 |23.72.38.96 |51252|80 |6 |0 |0 |
|1504781751|1504801216|16 |20 |16 |192.168.0.11 |23.72.38.96 |51252|80 |6 |0 |0 |
|1504766585|1504801336|16 |22 |16 |192.168.0.11 |23.204.108.58 |51249|80 |6 |0 |0 |
|1504766585|1504801336|16 |22 |16 |192.168.0.11 |23.204.108.58 |51249|80 |6 |0 |0 |
|1504781751|1504801336|16 |22 |16 |192.168.0.11 |23.72.38.96 |51252|80 |6 |0 |0 |
|1504781751|1504801336|16 |22 |16 |192.168.0.11 |23.72.38.96 |51252|80 |6 |0 |0 |
In this example I want to select all rows with latest stoptime, all the other column values are duplicates.