1

I have setup the file system as such:

\project
   \something
       __init__.py
       some.py (with a function test() defined)
   run.py

And my run.py looks like this:

import os
import sys
import dask
from dask.distributed import Client
from dask_jobqueue import SLURMCluster
import time

def run_task1():
    sys.path.append('/project/something')
    from some import test
    return test()

def run_task2():
    from something.some import test # because something dir is in the current working dir
    return test()


def run_controller():
    cluster = SlurmCluster(...)
    cluster.scale_up(2)
    client = Client(cluster)
    sys.path.append('/project/something') 
    os.environ['PATH'].append('/project/something')
    os.environ['PYTHONPATH'].append('/project/something')
    from some import test
    v1 = [
        #dask.delayed(run_task1)() for _ in range(2)  #<--- this works
        #dask.delayed(run_task2)() for _ in range(2)  #<--- this works too
        dask.delayed(test)() for _ in range(2)        #<--- fails, but I need to do this
    ]
    values = dask.compute(*v1)
    return values

values = run_controller()

And the error is that worker fails immediately as it could not run test() as it could not import it from some.py. I verified that dask worker's os.environ['PATH'], os.environ['PYTHONPATH'] and sys.path all have the added path to some.py, but the dask worker still could not run it. Below is the error logged in the slurm log. ''' ModuleNotFoundError: No module named 'some' distributed.worker - ERROR - Could not deserialize task '''

I need to run the function directly, aka I cannot have a wrapper that executes an explicit import in the dask worker, but that is the method that does not work.

I have a hacky solution to get a solution like run_task2(), by creating a symlink to some.py in the current working dir. But I am wondering if there is a proper way to setup the environment of the dask worker so that a direct dask.delayed call on test() works.

michaelgbj
  • 290
  • 1
  • 10

0 Answers0