I'm new to spark. Could somebody explain more detail about "aggregatByKey() lets you return result in different type than input value type while reduceByKey() return the same type as input". If i use reduceByKey() i also can get a different type of value in output:
>>> rdd = sc.parallelize([(1,3),(2,3),(1,2),(2,5)])
>>> rdd.collect()
[(1, 3), (2, 3), (1, 2), (2, 5)]
>>> rdd.reduceByKey(lambda x,y: str(x)+str(y)).collect()
[(2, '35'), (1, '32')]
As we can see - input is int, output - str. Eather i don't understand this diff correctly? whats the point?