0

This is perhaps very basic but I am certainly missing something here. It is all within a method, where I increment/alter a variable within a child/sub scope (it could be within a if block, or like here, within a map)

However, the result is unchanged variable. e.g. here, sum remains zero after the map. whereas, it should come up to 3L.

What am I missing ?

val resultsMap = scala.collection.mutable.Map.empty[String, Long]
resultsMap("0001") = 0L
resultsMap("0003") = 2L
resultsMap("0007") = 1L
var sum = 0L
resultsMap.mapValues(x => {sum = sum + x})
//  I first wrote this, but then got worried and wrote more explicit version too, same behaviour
// resultMap.mapValues(sum+=_)
println("total of counts for txn ="+sum) // sum still 0

-- Update I have similar behaviour where a loop is not updating the variable outside the loop. looking for text on variable scoping, but not found the golden source yet. all help is appreciated.

var cnt : Int = 0
    rdd.foreach(article => {
      if (<something>) {
        println(<something>) // being printed
        cnt += 1
        println("counter is now "+cnt) // printed correctly
      }
    })
Raghav
  • 2,128
  • 5
  • 27
  • 46
  • 2
    Possible duplicate of [update variables using map function on spark](http://stackoverflow.com/questions/32774527/update-variables-using-map-function-on-spark) – The Archetypal Paul Mar 22 '17 at 14:22
  • You have two different problems. The first (`mapValues` producng a view) has been answered. Your edited one is because Spark runs the loop across many workers, each of which gets their own copy of `cnt` and the originating one is NOT updated. See http://stackoverflow.com/questions/32774527/update-variables-using-map-function-on-spark?rq=1 (and a bunch of other questions) – The Archetypal Paul Mar 22 '17 at 14:24

1 Answers1

2

You should proceed like this:

val sum = resultsMap.values.reduce(_+_)

You just get your values and then add them up with reduce.

EDIT:

The reason sum stays unchanged is that mapValues produces a view, which means (among other things) the new map won't be computed unless the resulting view is acted upon, so in this case - the code block updating sum is simply never executed.

To see this - you can "force" the view to "materialize" (compute the new map) and see that sum is updated, as expected:

var sum = 0L
resultsMap.mapValues(x => {sum = sum + x}).view.force
println("SUM: " + sum) // prints 3

See related discussion here: Scala: Why mapValues produces a view and is there any stable alternatives?

Community
  • 1
  • 1
meucaa
  • 1,455
  • 2
  • 13
  • 27
  • Appreciate your pointer about mapValues giving only a view and therefore the effect. However, i have similar behaviour and i believe its linked to variable scoping. -- Updated the question – Raghav Mar 22 '17 at 02:44