3

I am trying to make a Nextflow script that utilizes a python script. My python script imports a number of modules but within Nextflow python3 does not find two (cv2 and matplotlib) of 7 modules and crashes. If I call the script directly from bash it works fine. I would like to avoid creating a docker image to run this script.

Error executing process > 'grab_images (1)'

Caused by:
  Process `grab_images (1)` terminated with an error exit status (1)

Command executed:

  python3 --version
  echo 'processing image-1.npy'
  python3 /home/hq/cv_proj/k_means2.py image-1.npy

Command exit status:
  1

Command output:
  Python 3.7.3
  processing image-1.npy

Command error:
  Traceback (most recent call last):
    File "/home/hq/cv_proj/k_means2.py", line 5, in <module>
      import matplotlib.pyplot as plt 
  ModuleNotFoundError: No module named 'matplotlib'

Work dir:
  /home/hq/cv_proj/work/7f/b787c62ec420b2b5eb490603ef913f

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

I think there is a path issue as modules like numpy, sys, re, time are successfully loaded. How can I fix?

Thanks in advance

UPDATE

To assist other who may have problems using python in nextflow scripts... Make sure your shebang is correct. I was using

    #!/usr/bin/python 

instead of

    #!/usr/bin/python3

Since all of my packages were installed with pip3 and I exclusively use python3 you need to have the right shebang.

TheCodeNovice
  • 750
  • 14
  • 35
  • do a `which python` in your process block and in your regular bash shell to ensure you are using the same python installation in both – Pallie Aug 16 '21 at 11:59

1 Answers1

1

Best to avoid absolute paths to your script(s) in your process declarations. This section of the docs is worth taking some time to read: https://www.nextflow.io/docs/latest/sharing.html#manage-dependencies, particularly the subsection on how to manage third party scripts:

Any third party script that does not need to be compiled (Bash, Python, Perl, etc) can be included in the pipeline project repository, so that they are distributed with it.

Grant the execute permission to these files and copy them into a folder named bin/ in the root directory of your project repository. Nextflow will automatically add this folder to the PATH environment variable, and the scripts will automatically be accessible in your pipeline without the need to specify an absolute path to invoke them.

Then the problem is how to manage your Python dependencies. You mentioned Docker is not an option. Is Conda also not an option? The config for Conda might look something like:

name: myenv
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - conda-forge::matplotlib-base=3.4.3
  - conda-forge::numpy=1.21.2
  - conda-forge::opencv=4.5.2

Then if the above is in a file called environment.yml, create the environment with:

conda env create

See also the best practices for using Conda.

Steve
  • 51,466
  • 13
  • 89
  • 103
  • Thanks for the link I will take a read. I think my issue is that I am running nextflow scripts with sudo rather than myself. I installed the other modules with sudo pip3 and it worked. That is a gross solution though. I need to figure out how to get nextflow without elevated privileges. I have googled but no luck so far. – TheCodeNovice Aug 16 '21 at 18:45
  • @TheCodeNovice Ah, yeah that'll be it. You definitely don't need to be root to run Nextflow :-) All you need to install is `curl -s https://get.nextflow.io | bash` (as yourself), assuming you have Java 8 or later. Then just move the executable to somewhere in your $PATH. Also, instead of installing Python mods globally, it might be preferable to use a virtual env: `python -mvenv myenv`, activate it, then `pip install` your deps. – Steve Aug 17 '21 at 02:46
  • I tried this but I am getting `.nextflow/history.lock (Permission denied)`. Have you seen that before? – TheCodeNovice Aug 17 '21 at 20:40
  • @TheCodeNovice Permission denied... where are you trying to run your Nextflow executable from? Did the test command run ok: `./nextflow run hello`? – Steve Aug 18 '21 at 10:53
  • If I run hello from my bare home directory it runs. If I try and run from a folder in my home directoy like `~/foo` using `~/nextflow run hello`. I get the history lock error. – TheCodeNovice Aug 23 '21 at 02:56
  • @TheCodeNovice You could see that error if you don't have write permissions on `~/foo/.nextflow`. Check that you own it, or just blow it away and have Nextflow re-create it on your next run. It could be owned by root if you've been running Nextflow as root. – Steve Aug 23 '21 at 11:00
  • 1
    This worked, since I first ran as root it created all this infrastructure that I was not owner of and thus was blocking everything. Blowing it all away did the trick. Thank you! – TheCodeNovice Aug 23 '21 at 18:19