How spark reduce works here

Question

How does spark reduce work for this example?

val num = sc.parallelize(List(1,2,3))
val result = num.reduce((x, y) => x + y)
res: Int = 6


val result = num.reduce((x, y) => x + (y * 10))
res: Int = 321

I understand the 1st result (1 + 2 + 3 = 6). For the 2nd result, I thought the result would be 60 but it's not. Can someone explain?

Step1 : 0 + (1 * 10) = 10
Step2 : 10 + (2 * 10) = 30
Step3 : 30 + (3 * 10) = 60

Update: As per Spark documentation:

The function should be commutative and associative so that it can be computed correctly in parallel.

https://spark.apache.org/docs/latest/rdd-programming-guide.html

score 2 · Accepted Answer · answered Sep 25 '19 at 19:23

2

(2,3) -> 2 + 3 * 10 = 32 
(1,(2,3)) -> (1,32) -> 1 + 32 * 10 = 321

A reducer (in general, not just Spark), takes a pair, applies the reduce function and takes the result and applies it again to another element. Until all elements have been applied. The order is implementation specific (or even random if in parallel), but as a rule, it should not affect the end result (commutative and associative).

Check also this https://stackoverflow.com/a/31660532/290036

answered Sep 25 '19 at 19:23

Horatiu Jeflea

7,256
6
38
67

It seems the result can be inconsistent. For the same set of numbers, the result could also be 51. ( (1,2),3 ) -> ( (1+ 2*10),3 ) -> (21,3) -> (21 + 3*10) = 51. – user0000 Sep 25 '19 at 19:38
exactly, the function you provided is not commutative and associative – Horatiu Jeflea Sep 25 '19 at 19:43
(x,y) => x + y for example is; it returns the sum regardless of order – Horatiu Jeflea Sep 25 '19 at 19:44

How spark reduce works here

1 Answers1