0

I am running into some errors when trying to set up my own client using jobqueue PBS Cluster instead of using a default local cluster (i.e., client = Client()).

When setting the default, my own modules were recognized, but I realized my workers in the PBS Cluster could not find them. This page and other research was helpful in understanding what I might be able to do.

I organized my modules into a package and used pip install -e . since I'll still be developing it. I confirmed my python environment site-packages directory has my package (via an .egg-link file).

I hoped installing the package would make my modules available, but I received the same error when I run my code after setting up a basic PBS Cluster:

cluster = PBSCluster(cores=x,memory=y)
cluster.scale(n)
client=Client(cluster)

Is my basic idea of installing the modules as a package not enough?

I looked into client.upload_file based on this answer as another means to make the reference to my module file explicit. Will I need to do something like this still to install modules directly on the workers?

Apologies for length, I am very new to both dask and operating on a HPC.

Thanks for any help.

DanS
  • 1

1 Answers1

0

First, just a sanity check: When using an HPC cluster, there is typically a shared filesystem, which all workers can access (and so can your client machine). Is that the case for your cluster? If so, make sure your conda environment is in a shared location that all workers can access.

I organized my modules into a package and used pip install -e .

That should work, as long as your source code is also on the shared filesystem. The directory pointed to by the .egg-link file should be accessible from the worker machines. Is it?

Stuart Berg
  • 17,026
  • 12
  • 67
  • 99
  • Yes, I believe so. The `egg-link` essentially contains `/projects/users/this_project/my_stuff` with the package contained in what in this example is called 'my_stuff'. This path also comes up with I `import sys` and check syspath. – DanS Jul 15 '20 at 19:07