I would like to be able to install R packages from GitHub in a R conda environment created by Snakemake, as well as python libraries via pip
in a python environment. I'll use these environments in a whole set of rules thereafter.
My initial thought was to create a rule running a script to install the specified packages.
For instance, my initial run was: snakemake -j1 --use-conda -R create_r_environment
.
My Snakefile:
rule create_r_environment:
conda:
"envs/r.yaml"
script:
"scripts/r-dependencies.R"
rule create_python_environment:
conda:
"envs/python.yaml"
script:
"scripts/python-dependencies.py"
My envs/r.yaml file:
channels:
- conda-forge
dependencies:
- r=4.0
My r-dependencies.R file:
remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")
My envs/pyton.yaml file:
channels:
- conda-forge
dependencies:
- python=3.8.2
My python-dependencies.py file:
!pip install gseapy
The log output:
Building DAG of jobs...
Creating conda environment envs/r.yaml...
Downloading and installing remote packages.
Environment for envs/r.yaml created (location: .snakemake/conda/388,repos = "http://cran.us.r-project.org")f7df8)
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 create_r_environment
1
[Fri Oct 30 22:38:56 2020]
rule create_r_environment:
jobid: 0
Activating conda environment: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8
[Fri Oct 30 22:38:57 2020]
Error in rule create_r_environment:
jobid: 0
conda-env: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8
RuleException:
CalledProcessError in line 5 of /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/Snakefile:
Command 'source /home/cmcouto-silva/miniconda3/bin/activate '/home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8'; set -euo pipefail; Rscript --vanilla /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/scripts/tmpa6jdxovx.r-dependencies.R' returned non-zero exit status 1.
File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2168, in run_wrapper
File "/home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/Snakefile", line 5, in __rule_create_r_environment
File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 529, in _callback
File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/concurrent/futures/thread.py", line 57, in run
File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2199, in run_wrapper
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/log/2020-10-30T223743.852983.snakemake.log
My folder structure:
.
├── envs
│ ├── python.yaml
│ └── r.yaml
├── scripts
│ ├── python-dependencies.py
│ └── r-dependencies.R
└── Snakefile
It successfully creates the environment but fails when running the script, and I don't know why. I've changed the envs/r.yaml
file content to install.packages("data.table")
to see if there was an issue with the github package, but it's not. It fails anyway. The same occurs when I run the rule create_python_environment
(output not showed here).
Any help?
Edit after the accepted answer
As @dariober pointed out, I forgot to install the remotes
package before calling it in the script. I did it in the .yaml file, and it worked well. Also, I installed the pip libraries using shell instead of a python file.
I would like to highlight some points though, just in case anyone's facing the same or similar problem:
First, I could successfully install further packages I needed to, but some of them require specific libraries (e.g. libcurl), which is installed in my system, but it's not recognized inside the Snakemake conda environment, forcing me to either install it in the Snakemake conda environment (which is good for reproducibility, although I don't know how to do that yet) or specify the path library. Maybe a better option would be using a container just like @merv commented out.
Second, I figured out that Snakemake already provides a way to install pip libraries using the .yaml file. From the documentation, it looks like this:
name: stats2
channels:
- javascript
dependencies:
- python=3.6 # or 2.7
- bokeh=0.9.2
- numpy=1.9.*
- nodejs=0.10.*
- flask
- pip:
- Flask-Testing