Scala parallel for running out of RAM

Question

So for a homework assignment I am supposed to play with several threading mechanisms using a simple integration of a function that should result in pi. The implementation is supposed to handle an interval of over 500 Billion. My current implementation handles the for loop up to about 50 million on a heap size of 2GB. Now my question is why does the implementation use so much memory? (I think its because the range has to be made in advance, is this true?) And how do I improve memory use? Is it possible to do with parallel collections or am I forced to use a thread pool for something like this?

Note: I will get full credit with the following implementation. It is for my intellectual curiosity and my dream of becoming more fluent in scala.

import scala.Math

object Pi {
 def calculate_pi(interval: Int): Double = {
    val start: Long = System.currentTimeMillis;
    var duration: Long = 0
    var x: Double = 2.0/interval
    var y: Double = 0.0
    var mypi: Double = 0.0

    (x until 1.0 by 1.0/interval).par.foreach(i => {
       y = 4.0/(1.0+x*x)
       mypi += (1.0/interval)*y
     })

   duration = (System.currentTimeMillis - start)
   println("scala calculate_pi\t" + interval + "\t" + duration + "ms\t" + mypi)
   return mypi
 }




object Main extends App {
  println("Please input the interval for the calculation")
  if(args.length < 1) { sys.exit }
  val interval = args(0).toInt 
  Pi.calculate_pi_seq(interval)
  Pi.calculate_pi(interval)
}

Does it even work? It looks like you are modifying mypi concurrently. And, more mundane but of more consequence, you do not use your loop variable i (you are using your constant x instead) — Didier Dupont, Feb 09 '12 at 08:23

score 6 · Accepted Answer · answered Feb 09 '12 at 12:55

This is all kinds of wrong:

(x until 1.0 by 1.0/interval).par.foreach(i => {
   y = 4.0/(1.0+x*x)
   mypi += (1.0/interval)*y
 })

The first problem is that all computations of y are identical: you are not using i while computing it. Since x doesn't change, all threads compute the same value.

And here's the second problem, you are computing mypi (and y) in parallel. That means multiple threads are reading and and writing both mypi and y at the same time.

Let's consider one execution to understand the problem in that. Let's say the first thread starts running, computes y and then reads y and mypi. That thread then pauses, and all the other threads run. Finally, that thread resumes and writes the result of its computation to mypi. In this case, all the computations of all the others threads are wasted, because the final value was given by that one thread.

That was a simple case. Basically, you cannot predict at all what will happen for each of those read and writes to mypi (y is easier, since all threads assign the same value to it).

And, yes, when you call .par on a NumericRange, it creates a collection with all values of that NumericRange.

Thanks for the x*x thing wasn't paying attention when I was translating the C sequential one. I'm not sure why modifying mypi in parallel should matter additions to integers are atomic if I am not mistaken. Obviously in general this is not good practice, but I think in the simple case this should be ok. Is there a suggestion resolve this problem, i.e should I store all of the intermediate results in a list and do a summation on it? — Bbatha, Feb 10 '12 at 00:10
Reading int is atomic, writing int is atomic. Read-modify-write is not atomic. (See lost update.) Second problem is that mypi can be thread locally cached. (See java keyword volatile.) One option is to use map-reduce algorithm. — user482745, Feb 29 '12 at 09:07

Jens Egholm · Answer 2 · 2012-02-09T12:13:26.437

-3

Not knowing the underlying application, I have learned by experiments that if you use the method par on Range (for instance) it is instantiated in advance, as you pointed out.

However it looks like you are only using the collections to take advantage of the parallelization. In other words to calculate a piece of code which is somewhat unrelated to the collection itself - the value i is not even being used. The foreach loop is thus pretty much redundant since you are only interested is an y and x value. It might seem like a large amount of work for something a simple for-loop could accomplish.

That said other types of parallelization is pretty easy in scala. What about using actors? They're lightweight and extremely simple. Otherwise worker threads or maybe even Java threads might do the trick.

edited Feb 09 '12 at 12:13

answered Feb 09 '12 at 10:32

Jens Egholm

2,730
3
22
35

Er, what? `Range` is not a "collection" and takes no more memory whether it's `1 to 2` than `1 to 1000000`. I don't even understand what you mean by "only using collections to take advantage of parallelization" either: looks like the looping is pretty fundamental to this algorithm to me! – oxbow_lakes Feb 09 '12 at 11:37
Of course it is :) It's just a collection ranging from one to another value. I'm sorry for my unfortunate choice of words though, but what I meant was that if you use the method par then that will result in scala running off and calculating each elements in the collection (which is not the case in Range or Stream). Regarding "using collections to take advantage of parallelization", I guess that is pretty obvious, when the method par() is taken into use. Par is simply the collections of scalas presenting a smart alternative to creating threads. Hence the "advantage" of collections. – Jens Egholm Feb 09 '12 at 12:09

Scala parallel for running out of RAM

2 Answers2