I have setup the file system as such:
\project
\something
__init__.py
some.py (with a function test() defined)
run.py
And my run.py looks like this:
import os
import sys
import dask
from dask.distributed import Client
from dask_jobqueue import SLURMCluster
import time
def run_task1():
sys.path.append('/project/something')
from some import test
return test()
def run_task2():
from something.some import test # because something dir is in the current working dir
return test()
def run_controller():
cluster = SlurmCluster(...)
cluster.scale_up(2)
client = Client(cluster)
sys.path.append('/project/something')
os.environ['PATH'].append('/project/something')
os.environ['PYTHONPATH'].append('/project/something')
from some import test
v1 = [
#dask.delayed(run_task1)() for _ in range(2) #<--- this works
#dask.delayed(run_task2)() for _ in range(2) #<--- this works too
dask.delayed(test)() for _ in range(2) #<--- fails, but I need to do this
]
values = dask.compute(*v1)
return values
values = run_controller()
And the error is that worker fails immediately as it could not run test() as it could not import it from some.py
. I verified that dask worker's os.environ['PATH'], os.environ['PYTHONPATH'] and sys.path
all have the added path to some.py
, but the dask worker still could not run it. Below is the error logged in the slurm log.
'''
ModuleNotFoundError: No module named 'some'
distributed.worker - ERROR - Could not deserialize task
'''
I need to run the function directly, aka I cannot have a wrapper that executes an explicit import in the dask worker, but that is the method that does not work.
I have a hacky solution to get a solution like run_task2(), by creating a symlink to some.py in the current working dir. But I am wondering if there is a proper way to setup the environment of the dask worker so that a direct dask.delayed call on test() works.