1

How does spark reduce work for this example?

val num = sc.parallelize(List(1,2,3))
val result = num.reduce((x, y) => x + y)
res: Int = 6


val result = num.reduce((x, y) => x + (y * 10))
res: Int = 321

I understand the 1st result (1 + 2 + 3 = 6). For the 2nd result, I thought the result would be 60 but it's not. Can someone explain?

Step1 : 0 + (1 * 10) = 10
Step2 : 10 + (2 * 10) = 30
Step3 : 30 + (3 * 10) = 60

Update: As per Spark documentation:

The function should be commutative and associative so that it can be computed correctly in parallel.

https://spark.apache.org/docs/latest/rdd-programming-guide.html

user0000
  • 369
  • 2
  • 7

1 Answers1

2
(2,3) -> 2 + 3 * 10 = 32 
(1,(2,3)) -> (1,32) -> 1 + 32 * 10 = 321

A reducer (in general, not just Spark), takes a pair, applies the reduce function and takes the result and applies it again to another element. Until all elements have been applied. The order is implementation specific (or even random if in parallel), but as a rule, it should not affect the end result (commutative and associative).

Check also this https://stackoverflow.com/a/31660532/290036

Horatiu Jeflea
  • 7,256
  • 6
  • 38
  • 67