13

Using dask distributed i try to submit a function that is located in another file named worker.py. In workers i've the following error :

No module named 'worker'

However I'm unable to figure out what i'm doing wrong here ...

Here is a sample of my code:

import worker

def run(self):
    dask_queue = queue.Queue()
    remote_queue = self.executor.scatter(dask_queue)
    map_queue = self.executor.map(worker.run, remote_queue)
    result = self.executor.gather(map_queue)

    # Load data into the queue
    for option in self.input.get_next_option():
        remote_queue.put([self.server, self.arg, option])

Here is the complete traceback obtained on the worker side:

distributed.core - INFO - Failed to deserialize b'\x80\x04\x95\x19\x00\x00\x00\x00\x00\x00\x00\x8c\x06worker\x94\x8c\nrun\x94\x93\x94.' Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/distributed/core.py", line 74, in loads return pickle.loads(x) ImportError: No module named 'worker' distributed.worker - WARNING - Could not deserialize task Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/distributed/worker.py", line 496, in compute_one task) File "/usr/local/lib/python3.5/dist-packages/distributed/worker.py", line 284, in deserialize function = loads(function) File "/usr/local/lib/python3.5/dist-packages/distributed/core.py", line 74, in loads return pickle.loads(x) ImportError: No module named 'worker'

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Bertrand
  • 994
  • 9
  • 23

4 Answers4

5

This problem could occur for 2 cases: the imports in the main code that calls the dask-distributed function was not found, or the import inside the dask-distributed function was not found. Either way, the solution is to update the sys.path to point to where those modules are.

In my case, I updated both.

For example, let's say in your main script you have module xxx and in the dask function you want to distribute you have module yyy. The could should look like this:

from dask.distributed import Client
import sys

def update_syspath():
  sys.path.insert(0, 'module_xxx_location')

# you have to update sys.path first before import the xxx module
import xxx

def dask_function():
  sys.path.insert(0, 'module_yyy_location')
  import yyy

client.submit(dask_function, params)
wakandan
  • 1,099
  • 4
  • 19
  • 27
4

Similar issue is faced by me. Functions from a python module were used while creating dask graph. However, worker process can not find the python module.

Following errors were presented in worker console. Here, tasks.py contained the worker function used in dask graph.

[ worker 10.0.2.4 ] : ModuleNotFoundError: No module named 'tasks'
[ worker 10.0.2.4 ] : distributed.protocol.pickle - INFO - Failed to deserialize b'\x80\x04\x95\x14\x00\x00\x00\x00\x00\x00\x00\x8c\x05tasks\x94\x8c\x06ogs_mk\x94\x93\x94.'

When Client.upload_file was used (shown below) to send local python modules to workers, the issue got resolved.

client.upload_file('tasks.py')     ## Send local package to workers
results = client.get(dsk, 'root')  ## get the results
3

Edit: see MRocklin comment for a cleaner solution

Actually if the code to execute in dask worker is in a external module it must be known from the dask worker path (It's not serialized from the client to the worker).

Changing my PYTHONPATH to ensure that the worker knows that module fixed the issue. A similar issue was posted in dask issues:

https://github.com/dask/distributed/issues/344

Bertrand
  • 994
  • 9
  • 23
  • 3
    You might also want to look at the Client.upload_file method: http://distributed.readthedocs.io/en/latest/api.html#distributed.client.Client.upload_file – MRocklin Oct 14 '16 at 12:29
  • It looks indeed better than my own solution :) Thx – Bertrand Oct 14 '16 at 12:55
  • 3
    @Bertrand I know this is an old post but I couldn't understand what you mean by saying "Changing my PYTHONPATH...". I've tried to add my project folder to PYTHONPATH but this didn't solve my issue. I'm using venv. Do you think this may have something to do with the venv? How did you "Change your PYTHONPATH" ? – MehmedB Nov 18 '19 at 19:31
  • Yes they are. But I realized something. The missing thing is not an actual module but a script inside the project. I've packaged the script and installed it with pip module. This fixed the issue. – MehmedB Nov 23 '19 at 08:34
  • some hows I found all so threads with dask always never has any solution. – wakandan Nov 13 '21 at 04:27
0

When Dask worker couldn't find modules in my current working directory, I solved it by slightly adjusting the command to start it:

PYTHONPATH=$(pwd) dask-worker 127.0.0.1:8786
Michał Fronczyk
  • 1,859
  • 4
  • 24
  • 29