1

I would like to take an arbitrary function at runtime that may or may not be defined with the @everywhere macro, and run it in a distributed way.

My first, naive attempt was to simply try and use the function inside of pmap

@assert nprocs() > 1
function g(f)
    pmap(1:5) do x
        f(x)
    end
end

addone(x) = x+1
subone(x) = x-1
g(addone)
g(subone)

This, of course, did not work and resulted in.

On worker 2:
UndefVarError: `#addone` not defined

Next I tried passing the function f as an argument of pmap

@assert nprocs() > 1

function g(f)
    pmap(zip(1:5, Iterators.repeated(f))) do (x,f)
        f(x)
    end
end

addone(x) = x+1
subone(x) = x-1
g(addone)
g(subone)

This also did not work, it also threw

On worker 2:
UndefVarError: `#addone` not defined

Now I am at a loss, surely something like this must be possible in Julia.

tillfalko
  • 13
  • 3

2 Answers2

1

It's possible, but you shouldn't do it because it can easily be too much work.

Fundamentally, the problem is that a method only exists on one out of the multiple processes where you want to call it. The proper way is prevention by putting @everywhere at the source code that instantiates the method; depending on what the method needs, that could be at the function block itself, the include call that runs a file, or the using/import of a package. Bear in mind that as a macro, @everywhere doesn't take existing instances and copy it into multiple processes, it just evaluates the following source code expression in each process.

That said, it's possible to derive a method's expression instance from a function instance and arguments (CodeTracking.jl makes this easy), and it's possible to use @eval and @everywhere to evaluate it on the other processes. This is doable in your simple example, but it is generally more complicated than this. The method definition alone does not replicate the namespace it was defined in or remember how it was evaluated, so you would need to separately derive expressions for the related modules or global variables (AFAIK no package makes this easy). Pulling this off right is much harder and messier than the aforementioned prevention.

BatWannaBe
  • 4,330
  • 1
  • 14
  • 23
1

@BatWannBee is totally right that you should not do it and should use just @everywhere.

However if you want to do it here is the code snippet.

Firstly, we perfom a set up

using Distributed
addprocs(2)
@everywhere using Serialization

addone(x::Int) = x+1 + 100myid()

Now we move the function to the other workers

# the name of the function to be moved around
fname = :addone 

# Serializing methods of the function fname to a buffer
buf = IOBuffer()
serialize(buf, methods(eval(fname)))

# Deserializing the function on remote workers
# Note that there are two steps
# 1. creating an empty function 
# 2. providing methods
Distributed.remotecall_eval(Main, workers(), quote
 function $fname end 
 deserialize(seekstart($buf))
end)

Now we can test what we did:

julia> fetch(@spawnat 3 methods(addone))
# 1 method for generic function "addone" from Main:
 [1] addone(x::Int64)
     @ REPL[3]:1

julia> fetch(@spawnat 3 addone(4))
305
Przemyslaw Szufel
  • 40,002
  • 3
  • 32
  • 62
  • +1 for showing that methods can be serialized and deserialized like that, I prefer working with expressions I can read but from what I could tell, this is fairly robust – BatWannaBe Aug 18 '23 at 00:25
  • Thank you, and you are both right in that this does not seem like a good thing to do. Perhaps I am asking for the wrong thing. What if I wanted to write a function to monte carlo integrate the function f over an upper and lower bound. Is there no clean way to parallelize this? – tillfalko Aug 18 '23 at 10:16
  • You just do `@everywhere function f(x)`. It is a good idea to have the same function across workers. What you need to distribute/spread across the workers is the data. For your use case I would suggest just to use `pmap` – Przemyslaw Szufel Aug 18 '23 at 15:22
  • 1
    You really do need to define the method at each and every worker process. In simple cases where your function accesses no globals and exists in the Main module, it's feasible to do that after the method already exists at one process. Otherwise, put `@everywhere` in front of a module, include, import, or begin block; run the entire expression you need on the other processes in the first place. – BatWannaBe Aug 18 '23 at 21:52