If we do not mention any partitioner for a reduceByKey operation, does it perform hashPartitioning internally before the reduction? For example my test code is like:
val rdd = sc.parallelize(Seq((5, 1), (10, 2), (15, 3), (5, 4), (5, 1), (5,3), (5,9), (5,6)), 5)
val newRdd = rdd.reduceByKey((a,b) => (a+b))
Here, does the reduceByKey operation brings all records with same key to the same partition and the perform the reduction (for the above code when no partitioner is mentioned)? Since my use case has skewed data (all having same key), it can cause out of memory
error if it brings all records to one partition. So a uniform distribution of the records over all the partitions suits the use case here.