5

I would like to be able to install R packages from GitHub in a R conda environment created by Snakemake, as well as python libraries via pip in a python environment. I'll use these environments in a whole set of rules thereafter.

My initial thought was to create a rule running a script to install the specified packages.

For instance, my initial run was: snakemake -j1 --use-conda -R create_r_environment.

My Snakefile:

rule create_r_environment:
    conda:
        "envs/r.yaml"
    script:
        "scripts/r-dependencies.R"

rule create_python_environment:
    conda:
        "envs/python.yaml"
    script:
        "scripts/python-dependencies.py"    

My envs/r.yaml file:

channels:
 - conda-forge
dependencies:
 - r=4.0

My r-dependencies.R file:

remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")

My envs/pyton.yaml file:

channels:
 - conda-forge
dependencies:
 - python=3.8.2

My python-dependencies.py file:

!pip install gseapy

The log output:

Building DAG of jobs...
Creating conda environment envs/r.yaml...
Downloading and installing remote packages.
Environment for envs/r.yaml created (location: .snakemake/conda/388,repos = "http://cran.us.r-project.org")f7df8)
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   create_r_environment
    1

[Fri Oct 30 22:38:56 2020]
rule create_r_environment:
    jobid: 0

Activating conda environment: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8
[Fri Oct 30 22:38:57 2020]
Error in rule create_r_environment:
    jobid: 0
    conda-env: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8

RuleException:
CalledProcessError in line 5 of /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/Snakefile:
Command 'source /home/cmcouto-silva/miniconda3/bin/activate '/home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8'; set -euo pipefail;  Rscript --vanilla /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/scripts/tmpa6jdxovx.r-dependencies.R' returned non-zero exit status 1.
  File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2168, in run_wrapper
  File "/home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/Snakefile", line 5, in __rule_create_r_environment
  File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 529, in _callback
  File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/concurrent/futures/thread.py", line 57, in run
  File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
  File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2199, in run_wrapper
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/log/2020-10-30T223743.852983.snakemake.log

My folder structure:

.
├── envs
│   ├── python.yaml
│   └── r.yaml
├── scripts
│   ├── python-dependencies.py
│   └── r-dependencies.R
└── Snakefile

It successfully creates the environment but fails when running the script, and I don't know why. I've changed the envs/r.yaml file content to install.packages("data.table") to see if there was an issue with the github package, but it's not. It fails anyway. The same occurs when I run the rule create_python_environment (output not showed here).

Any help?


Edit after the accepted answer

As @dariober pointed out, I forgot to install the remotes package before calling it in the script. I did it in the .yaml file, and it worked well. Also, I installed the pip libraries using shell instead of a python file.

I would like to highlight some points though, just in case anyone's facing the same or similar problem:

First, I could successfully install further packages I needed to, but some of them require specific libraries (e.g. libcurl), which is installed in my system, but it's not recognized inside the Snakemake conda environment, forcing me to either install it in the Snakemake conda environment (which is good for reproducibility, although I don't know how to do that yet) or specify the path library. Maybe a better option would be using a container just like @merv commented out.

Second, I figured out that Snakemake already provides a way to install pip libraries using the .yaml file. From the documentation, it looks like this:

name: stats2
channels:
  - javascript
dependencies:
  - python=3.6   # or 2.7
  - bokeh=0.9.2
  - numpy=1.9.*
  - nodejs=0.10.*
  - flask
  - pip:
    - Flask-Testing
Cainã Max Couto-Silva
  • 4,839
  • 1
  • 11
  • 35
  • 2
    As a heavy Conda user (including with Snakemake), it's in situations like this (need custom packages or software) where I shift over to building a Docker/Singularity container with the software instead of a Conda env. Otherwise, the clean way to stay in Conda would be to build a Conda package. – merv Nov 02 '20 at 02:14
  • 1
    Glad to know your thought about this situation, @merv! I was talking to a friend after this question and we came to the same conclusion. I'm not a heavy conda and Snakemake user yet, but I want to! – Cainã Max Couto-Silva Nov 02 '20 at 20:35

1 Answers1

2

I think there are quite a few wrong things:

  • remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never"): In your r.yaml you should include the remotes package.

  • !pip install gseapy is not valid python code. If anything, it is code to be executed by shell but I'm not sure that leading ! is correct. Also, gseapy is available from bioconda I don;t see why you should install it with pip.


Before OP edited the question

My envs/r.yaml file:

remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")

It's odd that you get the conda environment correctly created since that r.yaml is not a valid environment file.

This is what I tried to recreate your issue:

r.yaml

 cat r.yaml  
 remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")

Snakefile:

cat Snakefile 
rule create_r_environment:
    conda:
        "r.yaml"
    script:
        "r-dependencies.R"

Execute:

snakemake -j1 --use-conda -R create_r_environment

Building DAG of jobs...
Creating conda environment r.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /home/dario/Downloads/r.yaml:

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda/exceptions.py", line 1079, in __call__
        return func(*args, **kwargs)
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/cli/main.py", line 80, in do_call
        exit_code = getattr(module, func_name)(args, parser)
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/cli/main_create.py", line 80, in execute
        directory=os.getcwd())
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/specs/__init__.py", line 40, in detect
        if spec.can_handle():
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/specs/yaml_file.py", line 18, in can_handle
        self._environment = env.from_file(self.filename)
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/env.py", line 151, in from_file
        return from_yaml(yamlstr, filename=filename)
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/env.py", line 137, in from_yaml
        data = validate_keys(data, kwargs)
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/env.py", line 35, in validate_keys
        new_data = data.copy() if data else {}
    AttributeError: 'str' object has no attribute 'copy'

`$ /home/dario/miniconda3/bin/conda-env create --file /home/dario/Downloads/.snakemake/conda/095b0ca2.yaml --prefix /home/dario/Downloads/.snakemake/conda/095b0ca2`

  environment variables:
                 CIO_TEST=<not set>
        CMAKE_PREFIX_PATH=/home/dario/miniconda3/envs/tritume:/home/dario/miniconda3/envs/tritum
                          e/x86_64-conda-linux-gnu/sysroot/usr
  CONDA_AUTO_UPDATE_CONDA=false
      CONDA_BUILD_SYSROOT=/home/dario/miniconda3/envs/tritume/x86_64-conda-linux-gnu/sysroot
        CONDA_DEFAULT_ENV=tritume
                CONDA_EXE=/home/dario/miniconda3/bin/conda
             CONDA_PREFIX=/home/dario/miniconda3/envs/tritume
    CONDA_PROMPT_MODIFIER=(tritume)
         CONDA_PYTHON_EXE=/home/dario/miniconda3/bin/python
               CONDA_ROOT=/home/dario/miniconda3
              CONDA_SHLVL=1
            DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path
           MANDATORY_PATH=/usr/share/gconf/ubuntu.mandatory.path
                     PATH=/home/dario/miniconda3/envs/tritume/bin:/home/dario/miniconda3/condabi
                          n:/opt/gradle/gradle-5.2/bin:/home/dario/.local/share/umake/bin:/home/
                          dario/.local/bin:/home/dario/bin:/opt/gradle/gradle-5.2/bin:/usr/local
                          /sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/loc
                          al/games:/snap/bin:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-1
                          0-oracle/db/bin
       REQUESTS_CA_BUNDLE=<not set>
            SSL_CERT_FILE=<not set>
               WINDOWPATH=2

     active environment : tritume
    active env location : /home/dario/miniconda3/envs/tritume
            shell level : 1
       user config file : /home/dario/.condarc
 populated config files : /home/dario/.condarc
          conda version : 4.8.3
    conda-build version : not installed
         python version : 3.7.6.final.0
       virtual packages : __glibc=2.27
       base environment : /home/dario/miniconda3  (writable)
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/bioconda/linux-64
                          https://conda.anaconda.org/bioconda/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/dario/miniconda3/pkgs
                          /home/dario/.conda/pkgs
       envs directories : /home/dario/miniconda3/envs
                          /home/dario/.conda/envs
               platform : linux-64
             user-agent : conda/4.8.3 requests/2.22.0 CPython/3.7.6 Linux/4.15.0-91-generic ubuntu/18.04.4 glibc/2.27
                UID:GID : 1001:1001
             netrc file : None
           offline mode : False


An unexpected error has occurred. Conda has prepared the above report.

If submitted, this report will be used by core maintainers to improve
future releases of conda.
Would you like conda to send this report to the core maintainers?

[y/N]: 
Timeout reached. No report sent.


  File "/home/dario/miniconda3/envs/tritume/lib/python3.6/site-packages/snakemake/deployment/conda.py", line 320, in create

Anyway, your error says:

... r-dependencies.R' returned non-zero exit status 1

What do you have in r-dependencies.R?

dariober
  • 8,240
  • 3
  • 30
  • 47
  • Hey @dariober, thanks for your reply! Indeed, I've put the wrong r.yaml content in the question, sorry! I've updated the question with the content from .yaml, .R, and .py files. Thx for your time! – Cainã Max Couto-Silva Oct 31 '20 at 20:29
  • I see, thanks @dariober! My bad to forget to install `remotes` after calling it. Now it works as expected. About the pip usage, I used `gseapy` just as an example. Actually, I have a lot of libraries to install, and some of them requires a pip installation. – Cainã Max Couto-Silva Nov 02 '20 at 20:33