I have a RDD
of some mutable.Map[(Int, Array[Double])]
and I would like to reduce the maps by Int
and find the means of the elements of the arrays.
For example I have:
Map[(1, Array[0.1, 0.1]), (2, Array[0.3, 0.2])]
Map[(1, Array[0.1, 0.4])]
What I want:
Map[(1, Array[0.1, 0.25]), (2, Array[0.3, 0.2])]
The problem is that I don't know how reduce
works between maps and additionally I have to do it per partition, collect the results to the driver and reduce them there too. I found the foreachPartition
method but I don't know if it is meant to be used in such cases.
Any ideas?