I'm learning dask and I want to generate random strings. But this only works if the import statements are inside the function f
.
This works:
import dask
from dask.distributed import Client, progress
c = Client(host='scheduler')
def f():
from random import choices
from string import ascii_letters
rand_str = lambda n: ''.join(choices(population=list(ascii_letters), k=n))
return rand_str(5)
xs = []
for i in range(3):
x = dask.delayed(f)()
xs.append(x)
res = c.compute(xs)
print([r.result() for r in res])
This prints something like ['myvDi', 'rZnYO', 'MyzaG']
. This is good, as the strings are random.
This, however, doesn't work:
from random import choices
from string import ascii_letters
import dask
from dask.distributed import Client, progress
c = Client(host='scheduler')
def f():
rand_str = lambda n: ''.join(choices(population=list(ascii_letters), k=n))
return rand_str(5)
xs = []
for i in range(3):
x = dask.delayed(f)()
xs.append(x)
res = c.compute(xs)
print([r.result() for r in res])
This prints something like ['tySQP', 'tySQP', 'tySQP']
, which is bad because all the random strings are the same.
So I'm curious how I'm going to distribute large non-trivial code. My goal is to be able to pass arbitrary json to a dask.delayed
function and have that function perform analysis using other modules, like google's ortools
.
Any suggestions?