I wonder this simple question: does collect, zipWithIndex, map and flatMap on a RDD of 1 partition keep order?
Thanks
I wonder this simple question: does collect, zipWithIndex, map and flatMap on a RDD of 1 partition keep order?
Thanks
RDDs can only ever be sorted by their keys. Non-key/value data cannot be sorted.
But if you do have some sorted key/value data in an RDD, then collect will preserve the order. Note though that collectAsMap() will not preserve the order.
map() returns non-key/value data so the returned RDD is not sorted. The same goes for flatMap().
What about mapToPair() and flatMapToPair. If the RDD that these work on contains key/value data, then there is no reason to assume that the key of the output RDD is the same, and so the order cannot be assumed to be preserved. I cannot imagine, in the case that the key is unchanged, that these methods have been implemented to preserve the order.
mapValues() and flatMapValues() do preserve the key of the input RDD, so it might be possible that the order is preserved, but you will have to investigate this yourself.
As for zipWithIndex, according to this: How Can I Obtain an Element Position in Spark's RDD? the ordering of the RDD that zipWithIndex acts on is not preserved