So if i have a transformation before :
myRDD = someRDD.map()
mySecondRDD = myRDD.aggregateByKey(initValue)(CombOp , MergeOp)
In this point myRDD doesn't have a partitioner, but mySecondRDD has one hashPartitioner. Firstly i want to ask:
1)Do i have to designate a partitioner in myRDD? And If i do how is it possible to pass it as an argument in aggregateByKey?
*Note that myRDD is a transformation and hasn't a partitioner
2)Shouldn't at the end of these two commands myRDD have the same partitioner as mySecondRDD instead of none?
3) How many shuffles these 2 commands will do?
4)If i designate a partitioner with partitionBy in myRDD, and manage to pass it as an argument in aggregateByKey will i have reduced the shuffles to 1 instead of 2?
I am sorry i still don't quite get it how it works.