I have an application where I have a set of objects that do a lot of setting up (this takes up to 30s-1minute per object). Once they have been set-up, I want to pass a parameter vector (small, <50 floats) and return a couple of small arrays back. Computations per object are "very fast" compared to setting up the object. I need to run this fast method many, many times, and have a cluster which I can use.
My idea is that this should be "pool" of workers where each worker gets an object initialised with its own particular configuration (which will be different), and stays in memory in the worker. Subsequently, a method of this object on each worker gets a vector and returns some arrays. These are then combined by the main program (order is not important).
Some MWE that works is as follows (serial version first, dask version second):
import datetime as dt
import itertools
import numpy as np
from dask.distributed import Client, LocalCluster
# Set up demo dask on local machine
cluster = LocalCluster()
client = Client(cluster)
class Model(object):
def __init__(self, power):
self.power = power
def powpow(self, x):
return np.power(x, self.power)
def neg(self, x):
return -x
def compute_me(self, x):
return sum(self.neg(self.powpow(x)))
# Set up the objects locally
bag = [Model(power)
for power in [1, 2, 3, 4, 5]
]
x = [13, 23, 37]
result = [obj.compute_me(x) for obj in bag]
# Using dask
# Wrapper function to pass the local object
# and parameter
def wrap(obj, x):
return obj.compute_me(x)
res = []
for obj,xx in itertools.product(bag, [x,]):
res.append(client.submit(wrap, obj, xx))
result_dask = [r.result() for r in res]
np.allclose(result, result_dask)
In my real-world case, the Model
class does a lot of initialisation, pre-calculations, etc, and it takes possibly 10-50 times longer to initialise than to run the compute_me
method. Basically, in my case, it'd be beneficial to have each worker have a pre-defined instance of Model
locally, and have dask deliver the input to compute_me
.
This post (2nd answer) initialising and storing in namespace, but the example doesn't show how pass different initialisation arguments to each worker. Or is there some other way of doing this?