2

I use dask clusters to process dataset. My dask python project structure is as follows:

project_name/
  folder_a
    a.py
  folder_b
    b.py

I import function function_a() within a.py in b.py like follows:

from folder_a.a import function_a

@delayed
def test(input_str):
  return function_a(input_str)

client = ...

d_task = [test(str)]
dask.compute(d_task)

However, dask worker nodes throw exception ModuleNotFoundError: No module named 'folder_a'

I have tried many solutions like add PYTHONPATH={dir}/project_name while none of them are available.

How to solve this problem? Is there any way that can add {dir}/project_name to dask worker environment?

DuFei
  • 447
  • 6
  • 20
  • Please see my answer at https://stackoverflow.com/questions/39977635/dask-no-module-named-xxxx-error/69981627#69981627 – wakandan Nov 15 '21 at 22:30

1 Answers1

0

You did not say which scheduler you were using, but perhaps the new doc that I just wrote will help you sort out import issues: https://docs.dask.org/en/latest/setup/environment.html

mdurant
  • 27,272
  • 5
  • 45
  • 74
  • I have read the docs, I can understand what you mean, but I still can not find the solution. I run `dask-sheduler` on one machine and `dask-worker IP:8786` on other machines. The python projects is stored at /home/dufei/data/project_name where /home/dufei/data is a NFS path. I know I should tell dask-worker that all packages in project_name is needed and should be appended on worker env. While upload_file is the only way I can find that could be used to solve this. However, it can not upload folder. And I can't package my folder_a into a whl since I have some related py files in other folders – DuFei Nov 03 '20 at 03:41
  • If all the client and workers can see the same file system, then you can include the path you need in your PYTHONPATH. Generally, you *should* be able to make a set of packages out of all of your code, with appropriate dependencies declared. – mdurant Nov 03 '20 at 13:49
  • I have tried to set PYTHONPATH on my .bashrc and use sys.path.append(project_name) in my own py file. However, it doesn't work. – DuFei Nov 04 '20 at 02:44
  • You should verify the value of `sys.path` seen by your workers (`client.run(lambda: sys.path)`). You may instead want to sym-link your directory into one of the module search paths. – mdurant Nov 04 '20 at 14:03
  • @mdurant your documentation doesn't show how to manage sys path on worker. My main script has imports in the top of the file that refers to my custom modules, it already fails for importing those before coming to `client.submit` – wakandan Nov 13 '21 at 04:24