I've come across the glom()
method on RDD. As per the documentation
Return an RDD created by coalescing all elements within each partition into an array
Does glom
shuffle the data across the partitions or does it only return the partition data as an array? In the latter case, I believe that the same can be achieved using mapPartitions
.
I would also like to know if there are any use cases that benefit from glom
.