1

I am using parallel collection to run some code in parallel. Here is the code

val threadIds = new ConcurrentSkipListSet[Long]()
val pool = new ForkJoinPool(250)
val forkJoinSupport = new ForkJoinTaskSupport(pool)
list.par.taskSupport = forkJoinSupport
list.par.map{ element =>
  threadIds.add(Thread.currentThread().getId)
  ...
}
println(s"""No of actual threads in pool: ${threadIds.size()}: Threads = ${threadIds.asScala.mkString(",")}""")

The output from the println statement always is 64 whereas the expected thread count is 250

No of actual threads in pool: 64: Threads = ...

Am I missing something here?

Note: The machine in which this application runs has 8 cores.

Raj
  • 2,368
  • 6
  • 34
  • 52
  • 3
    Looks like `64` is the size of your list. Also note that you set `taskSupport` on one `par` object, and then execute `map` on a _different_ one (that still has default support). Also, seeing 64 different thread ids does not mean that you had 64 threads running _at the same time_. And also, it most likely doesn't make any sense to use 64 threads (let alone 250) for a concurrent calculation on an 8-core machine. – Dima Jan 13 '23 at 05:09
  • @Dima My bad. Thank you! Yep, I was using two different parallel collection. Using a single par fixed the issue. The list size is actually around 15000 and I am using the threads to trigger individual spark jobs – Raj Jan 13 '23 at 12:26

1 Answers1

2

As mentioned in a comment, you are not reading the size of the pool, but rather that of the collection (and again as mentioned there, you are creating two separate parallel collections by invoking par twice and working on them as if they were one). Furthermore, the ForkJoinPool makes no guarantees with regards to the size of the pool when the task queue is empty. As the following Scala shell session shows, threads are spun up lazily based on whether they are needed, and they are capped to the level of parallelism you ask for at construction:

scala> import java.util.concurrent.ForkJoinPool
import java.util.concurrent.ForkJoinPool

scala> val pool = new ForkJoinPool(250)
val pool: java.util.concurrent.ForkJoinPool = java.util.concurrent.ForkJoinPool@5e2a6991[Running, parallelism = 250, size = 0, active = 0, running = 0, steals = 0, tasks = 0, submissions = 0]

scala> pool.getPoolSize
val res3: Int = 0

scala> val sleep: Runnable = () => while (true) Thread.sleep(1000)
val sleep: Runnable = $Lambda$1182/0x0000000840644040@8585cdd

scala> for (_ <- 1 to 50) pool.execute(sleep)

scala> pool.getPoolSize
val res5: Int = 51

scala> pool.getPoolSize
val res6: Int = 51

scala> for (_ <- 1 to 200) pool.execute(sleep)

scala> pool.getPoolSize
val res8: Int = 250

scala> for (_ <- 1 to 200) pool.execute(sleep)

scala> pool.getPoolSize
val res10: Int = 250
stefanobaghino
  • 11,253
  • 4
  • 35
  • 63