0

I have a function which I use with pmap to paralellize it. I would like to run 4 times this function asynchronously using 10 workers each but I can't run two or more pmap at the same time.

I'm using Julia v1.1 with a 40-CPUs machine on linux.

using Distributed
addprocs(4)

@everywhere function TestParallel(x)
    a = 0
    while a < 4
        println("Value = ",x, " in worker = ", myid())
        sleep(1)
        a += 1
    end
end

a = WorkerPool([2,3])
b = WorkerPool([4,5])

c = [i for i = 1:10]
@sync @async for i in c
    pmap(x-> TestParallel(x), a, c)
    pmap(x-> TestParallel(x), b, c)
end

I expect to have:

From worker 2:    Value = 1 in worker = 2
From worker 3:    Value = 2 in worker = 3
From worker 4:    Value = 3 in worker = 4
From worker 5:    Value = 4 in worker = 5

So the firsts two elements of c go to the first pmap and the next two elements to the second pmap, then whoever finishes first gets the next two elements.

Now I'm obtaining:

 From worker 2:    Value = 1 in worker = 2
 From worker 3:    Value = 2 in worker = 3
 From worker 2:    Value = 1 in worker = 2
 From worker 3:    Value = 2 in worker = 3

After the first pmap completes all elements of c the second pmap starts over solving all elements again.

From worker 2:    Value = 9 in worker = 2
From worker 3:    Value = 10 in worker = 3
From worker 5:    Value = 2 in worker = 5
From worker 4:    Value = 1 in worker = 4

1 Answers1

0

There are some problems with your question: @sync and @async use green thread and you want to distribute your computations. Syntax @sync @async [some code] spawns a code asynchronously and waits for it to complete. Hence effectively it has the same meaning as [some code].

While your question is not clear I will assume that you want to launch 2 pmaps in parallel utilizing separate worker pools (this seems like the most likely thing you are trying to do).

In that case here is the code:

using Distributed
addprocs(4)

@everywhere function testpar2(x)
    for a in 0:3
        println("Value = $x [$a] in worker = $(myid())")
        sleep(0.2)
    end
    return 1000*myid()+x*x  #I assume you want to return some value
end


a = WorkerPool([2,3])
b = WorkerPool([4,5])

c = collect(1:10)

@sync begin
    @async begin 
        res1 = pmap(x-> testpar2(x), a, c)
        println("Got res1=$res1")
    end
    @async begin 
        res2 = pmap(x-> testpar2(x), b, c)
        println("Got res2=$res2")
    end
end

When running the above code you will see something like:

...
      From worker 5:    Value = 10 [3] in worker = 5
      From worker 2:    Value = 10 [3] in worker = 2
      From worker 3:    Value = 9 [3] in worker = 3
Got res2=[4001, 5004, 5009, 4016, 5025, 4036, 4049, 5064, 4081, 5100]
Got res1=[2001, 3004, 2009, 3016, 2025, 3036, 3049, 2064, 3081, 2100]
Task (done) @0x00000000134076b0

You can clearly seen that both pmaps have been run in parallel on different worker pools.

Przemyslaw Szufel
  • 40,002
  • 3
  • 32
  • 62