4

In snakemake, you can call external scripts like so:

rule NAME:
    input:
        "path/to/inputfile",
        "path/to/other/inputfile"
    output:
        "path/to/outputfile",
        "path/to/another/outputfile"
    script:
        "path/to/script.R"

This gives convenient access to an S4 object named snakemake inside the R script. Now in my case, I am running snakemake on a SLURM cluster, and I need to load R with module load R/3.6.0 before an Rscript can be executed, otherwise the job will return:

/usr/bin/bash: Rscript: command not found

How can I tell snakemake to do that? If I run the rule as a shell instead of a script, my R script unfortunately has no access to the snakemake object, so this is no desired solution:

shell:
    "module load R/3.6.0;"
    "Rscript path/to/script.R"
bgbrink
  • 643
  • 1
  • 6
  • 23

3 Answers3

4

You cannot call a shell command using the script tag. You definitely have to use the shell tag. You can always add your inputs and outputs as arguments:

rule NAME:
    input:
        in1="path/to/inputfile",
        in2="path/to/other/inputfile"
    output:
        out1="path/to/outputfile",
        out2="path/to/another/outputfile"
    shell:
        """
        module load R/3.6.0
        Rscript path/to/script.R {input.in1} {input.in2} {output.out1} {output.out2}
        """

and get your arguments in the R script:

args=commandArgs(trailingOnly=TRUE)
inFile1=args[1]
inFile2=args[2]
outFile1=args[3]
outFile2=args[4]

Use of conda environment:

You can specify a conda environment to use for a specific rule:

rule NAME:
    input:
        in1="path/to/inputfile",
        in2="path/to/other/inputfile"
    output:
        out1="path/to/outputfile",
        out2="path/to/another/outputfile"
    conda: "r.yml"
    script:
        "path/to/script.R"

and in you r.yml file:

name: rEnv
channels:
  - r
dependencies:
  - r-base=3.6

Then when you run snakemake:

snakemake .... --use-conda

Snakemake will install all environments prior to running and each environment will be activated inside the job sent to slurm.

Eric C.
  • 3,310
  • 2
  • 22
  • 29
  • Yeah this is has been my temporary solution. But it's less robust than accessing the arguments by name, because you have to remember which one goes in which position in your next script. – bgbrink Aug 08 '19 at 12:48
  • @bgbrink yes indeed. This only other alternative that I see is to use conda and specify an environment with R 3.6 for your rule. Another one that I don't like (not very reproducible) is to load your module before running snakemake and send your environment variables (including your PATH) to slurm. The latter solution has drawbacks if you want to load R ONLY for this rule. – Eric C. Aug 08 '19 at 15:54
  • To follow up on this...how do you specifiy an conda environment for a rule? Or is the snakemake environment that launches the jobs propagated to all the children, i.e. if I install R in my snakemake environment, will the jobs know about it? – bgbrink Aug 09 '19 at 05:18
  • @bgbrink I added a conda example. As for sending your environment variable to SLURM, sorry but I cannot help. I use SGE and put `snakemake --cluster "qsub -V ..."` to send my PATH and other variables to SGE jobs. – Eric C. Aug 09 '19 at 10:24
2

If your concern is to call the arguments by name in the Rscript command, you could have something like this (basically an extension of Eric's answer):

rule NAME:
    input:
        in1="path/to/inputfile",
        in2="path/to/other/inputfile"
    output:
        out1="path/to/outputfile",
        out2="path/to/another/outputfile"
    shell:
        r"""
        module load R/3.6.0
        Rscript path/to/script.R \
            inFile1={input.in1} inFile2={input.in2} \
            outFile1={output.out1} outFile2={output.out2}
        """

Then inside script.R you access each argument by parsing the command line:

args <- commandArgs(trailingOnly= TRUE)

for(x in args){
    if(grepl('^inFile1=', x)){
        inFile1 <- sub('^inFile1=', '', x)
    }
    else if(grepl('^inFile2=', x)){
        inFile2 <- sub('^inFile2=', '', x)
    }
    else if(grepl('^outFile1=', x)){
        outFile1 <- sub('^outFile1=', '', x)
    }
    else if(grepl('^outFile2=', x)){
        outFile2 <- sub('^outFile2=', '', x)
    }
    else {
        stop(sprintf('Unrecognized argument %s', x))
    }
}
# Do stuff with inFile1, inFile2, etc...

Consider also some library designed for parsing the command line, myself I'm quite happy with argparse for R

dariober
  • 8,240
  • 3
  • 30
  • 47
2

maybe u are finding envmodules, which is a derective of snakemake that activate cluster module , just like module load;

rule your_rule:
    input:
    output:
    envmodules:
        "R/3.6.0"
    shell:
        "some Rscript"
atongsa
  • 334
  • 3
  • 10