0

I am having trouble with pmap() throwing a BoundsError when setting the values of array elements - my code works for 1 worker but not >1. I have written a minimum working example which roughly follows the real code flow:

  1. Get source data
  2. Define set of points over which to iterate
  3. Initialise array points to be calculated
  4. Calculate each array point

The main file:

#pmapdemo.jl
using Distributed
#addprocs(length(Sys.cpu_info())) # uncomment this line for error
@everywhere include(joinpath(@__DIR__, "pmapdemo2.jl"))

function main()
    # Get source data
    source = Dict{String, Any}("t"=>zeros(5),
                               "x"=>zeros(5,6),
                               "y"=>zeros(5,3),
                               "z"=>zeros(5,3))
    # Define set of points over which to iterate
    iterset = Dict{String, Any}("t"=>source["t"],
                                "x"=>source["x"],
                                "y"=>fill(2, size(source["t"])[1], 1),
                                "z"=>fill(2, size(source["t"])[1], 1))
    data = Dict{String, Any}()

    # Initialise array points to be calculated
    MyMod.initialisearray!(data, iterset)
    # Calculate each array point
    MyMod.calcarray!(data, iterset, source) 
    @show data
end

main()

The functionality file:

#pmapdemo2.jl
module MyMod

using Distributed
@everywhere using SharedArrays

# Initialise data array 
function initialisearray!(data, fieldset)
    zerofield::SharedArray{Float64, 4} = zeros(size(fieldset["t"])[1],
                                               size(fieldset["x"])[2],
                                               size(fieldset["y"])[2],
                                               size(fieldset["z"])[2])
    data["field"] = deepcopy(zerofield)
end                            

# Calculate values of array elements according to values in source
function calcpoint!((data, source, a, b, c, d))
    data["field"][a,b,c,d] = rand()
end

# Set values in array
function calcarray!(data, iterset, source)
    for a in eachindex(iterset["t"])
        # [additional functionality f(a) here]
        b = eachindex(iterset["x"][a,:])
        c = eachindex(iterset["y"][a,:])
        d = eachindex(iterset["z"][a,:])
        pmap(calcpoint!, Iterators.product(Iterators.repeated(data,1), Iterators.repeated(source,1), Iterators.repeated(a,1), b, c, d))
    end
end

end

The error output:

ERROR: LoadError: On worker 2:
BoundsError: attempt to access 0×0×0×0 Array{Float64,4} at index [1]
setindex! at ./array.jl:767 [inlined]
setindex! at /build/julia/src/julia-1.1.1/usr/share/julia/stdlib/v1.1/SharedArrays/src/SharedArrays.jl:500 [inlined]
_setindex! at ./abstractarray.jl:1043
setindex! at ./abstractarray.jl:1020
calcpoint! at /home/dave/pmapdemo2.jl:25
#112 at /build/julia/src/julia-1.1.1/usr/share/julia/stdlib/v1.1/Distributed/src/process_messages.jl:269
run_work_thunk at /build/julia/src/julia-1.1.1/usr/share/julia/stdlib/v1.1/Distributed/src/process_messages.jl:56
macro expansion at /build/julia/src/julia-1.1.1/usr/share/julia/stdlib/v1.1/Distributed/src/process_messages.jl:269 [inlined]
#111 at ./task.jl:259
Stacktrace:
 [1] (::getfield(Base, Symbol("##696#698")))(::Task) at ./asyncmap.jl:178
 [2] foreach(::getfield(Base, Symbol("##696#698")), ::Array{Any,1}) at ./abstractarray.jl:1866
 [3] maptwice(::Function, ::Channel{Any}, ::Array{Any,1}, ::Base.Iterators.ProductIterator{Tuple{Base.Iterators.Take{Base.Iterators.Repeated{Dict{String,Any}}},Base.Iterators.Take{Base.Iterators.Repeated{Dict{String,Any}}},Base.Iterators.Take{Base.Iterators.Repeated{Int64}},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64}}}) at ./asyncmap.jl:178
 [4] #async_usemap#681 at ./asyncmap.jl:154 [inlined]
 [5] #async_usemap at ./none:0 [inlined]
 [6] #asyncmap#680 at ./asyncmap.jl:81 [inlined]
 [7] #asyncmap at ./none:0 [inlined]
 [8] #pmap#213(::Bool, ::Int64, ::Nothing, ::Array{Any,1}, ::Nothing, ::Function, ::Function, ::WorkerPool, ::Base.Iterators.ProductIterator{Tuple{Base.Iterators.Take{Base.Iterators.Repeated{Dict{String,Any}}},Base.Iterators.Take{Base.Iterators.Repeated{Dict{String,Any}}},Base.Iterators.Take{Base.Iterators.Repeated{Int64}},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64}}}) at /build/julia/src/julia-1.1.1/usr/share/julia/stdlib/v1.1/Distributed/src/pmap.jl:126
 [9] pmap(::Function, ::WorkerPool, ::Base.Iterators.ProductIterator{Tuple{Base.Iterators.Take{Base.Iterators.Repeated{Dict{String,Any}}},Base.Iterators.Take{Base.Iterators.Repeated{Dict{String,Any}}},Base.Iterators.Take{Base.Iterators.Repeated{Int64}},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64}}}) at /build/julia/src/julia-1.1.1/usr/share/julia/stdlib/v1.1/Distributed/src/pmap.jl:101
 [10] #pmap#223(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Function, ::Base.Iterators.ProductIterator{Tuple{Base.Iterators.Take{Base.Iterators.Repeated{Dict{String,Any}}},Base.Iterators.Take{Base.Iterators.Repeated{Dict{String,Any}}},Base.Iterators.Take{Base.Iterators.Repeated{Int64}},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64}}}) at /build/julia/src/julia-1.1.1/usr/share/julia/stdlib/v1.1/Distributed/src/pmap.jl:156
 [11] pmap(::Function, ::Base.Iterators.ProductIterator{Tuple{Base.Iterators.Take{Base.Iterators.Repeated{Dict{String,Any}}},Base.Iterators.Take{Base.Iterators.Repeated{Dict{String,Any}}},Base.Iterators.Take{Base.Iterators.Repeated{Int64}},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64}}}) at /build/julia/src/julia-1.1.1/usr/share/julia/stdlib/v1.1/Distributed/src/pmap.jl:156
 [12] calcarray!(::Dict{String,Any}, ::Dict{String,Any}, ::Dict{String,Any}) at /home/dave/pmapdemo2.jl:20
 [13] main() at /home/dave/pmapdemo.jl:19
 [14] top-level scope at none:0
in expression starting at /home/dave/pmapdemo.jl:23

In pmapdemo2.jl, replacing data["field"][a,b,c,d] = rand() with @show a, b, c, d demonstrates that all workers are running and have full access to the variables being passed, however instead replacing it with @show data["field"] throws the same error. Surely the entire purpose of SharedArrays is to avoid this? Or am I misunderstanding how to use it with pmap?

This is a crosspost from the Julia discourse here.

DocDave
  • 146
  • 6

1 Answers1

1

pmap will do the work of passing the data to the processes, so you don't need to use SharedArrays. Typically, the function provided to pmap (and indeed map) will be a pure function (and therefore doesn't mutate any variable) which returns one element of an output array. That function is mapped across each element of the input array, and the pmap function will construct the output array for you. For example, in your case, the code may look a bit like this

calcpoint(source, (a,b,c,d)) = rand() # Or some function of source and the indices a,b,c,d
field["data"] = pmap(calcpoint, Iterators.repeated(source), Iterators.product(a,b,c,d))
Jonathan Deakin
  • 100
  • 2
  • 6