2

In a Scala program I wrote I have a scala.collection.Map that maps a String to some calculated values (in detail it's Map[String, (Double, immutable.Map[String, Double], Double)] - I know that's ugly and should (and will be) wrapped). Now, if I do this:

stats.map { case(c, (prior, pwc, denom)) => {
  println(c)
  ...
  }
}

it takes about 30 seconds to print out roughly 50 times a value of c! The println is just a test statement - the actual calculation I need was even slower (I aborted after 1 minute of complete silence). However, if I do it like this:

stats.mapValues { case (prior, pwc, denom) => {
  println(prior)
  ...
  }
}

I don't run into these performance issues ... Can anyone explain why this is happening? Am I not following some important Scala guidelines?

Thanks for the help!

Edit:

I further investigated the behaviour. My guess is that the bottleneck comes from accessin the Map datastructure. If I do the following, I have have the same performance issues:

classes.foreach{c => {
  println(c)
  val ps = stats(c)
  }
}

Here classes is a List[String] that stores the keys of the Map externally. Without the access to stats(c) no performance losses occur.

kafman
  • 2,862
  • 1
  • 29
  • 51
  • possible duplicate of [In Scala Map, difference between mapValues and transform](http://stackoverflow.com/questions/25635803/in-scala-map-difference-between-mapvalues-and-transform) – Ben Reich Nov 17 '14 at 22:07

1 Answers1

3

mapValues actually returns a view on the original map, which can lead to unexpected performance issues. From this blog post:

...here is a catch: map and mapValues are different in a not-so-subtle way. mapValues, unlike map, returns a view on the original map. This view holds references to both the original map and to the transformation function (here (_ + 1)). Every time the returned map (view) is queried, the original map is first queried and the tranformation function is called on the result.

I recommend reading the rest of that post for some more details.

Ben Reich
  • 16,222
  • 2
  • 38
  • 59
  • That's definitely good to know! But this blog post would suggest that `mapValues` can be slower than `map`. However, I'm experiencing the opposite... – kafman Nov 17 '14 at 22:12
  • Keyword being *can*. If you used `mapValues(someComplexFunction)` and then looked up the value stored for a given key many times, then you would be evaluating `someComplexFunction` many times (as opposed to the single time per key that you would get with `map`). – Dylan Nov 17 '14 at 22:14
  • Agreed. But the code above only contains a simple `println`, I wouldn't expect such a huge difference in this case ... – kafman Nov 17 '14 at 22:32
  • 1
    Turns out I've already used a `mapValues` to compute the `stats` Map, which contains some heavy computations. Turning this into a `map` slows down the program at this point already. Thanks for the help! – kafman Nov 17 '14 at 22:57