Snakemake for R pipeline -- SETUP

Question

I am looking for some basic information about using Snakemake handle my R pipeline. It is my understanding that the two most common ways of doing this are by using the script flag and passing an R script, or using shell by passing to Rscript. If I want to use either of these methods, what should my R script look like? How does the R script know to look for and either call load on an RData object or read.table to read a csv file if the name is provided for input.

The other question is in regards to cluster submission/multi-threading. Snakemake will supposedly submit the task onto multiple nodes and use multiple cores automatically, without code modification. So would it be better to have one rule using cluster and/or cores that calls an R script to execute the entire pipeline OR break the pipeline into multiple rules/steps and use cluster and/or cores on that one rule?

Final question is what if my R code uses parallel/multi-threading packages like mclapply within the code itself? How does that affect Snakemake and its parameters?

Was unable to find answers to these questions online so any info will be appreciated.

Have you taken a look at the [snakemake](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html) documentation? It's pretty well explained in section "External scripts". In essence, within your R script (which you call as a `script` inside your snakemake rule) you automatically have access to an S4 object called `snakemake` which allows you to address the input and output objects. — Maurits Evers, Jul 03 '19 at 01:01
Okay great, guess I missed this. Any idea about my other two questions? @MauritsEvers — abbas786, Jul 03 '19 at 01:32
Note that if you are able to make a command-line interface for your R script (library `docopt` can be useful for that), you can use it like any command-line tool in a `shell` section. — bli, Jul 04 '19 at 15:46
You should probably break you question into three separate posts. People having an answer to only one of your questions would feel more legitimate to post an answer, and it would enable you to use more specific title and tags, making your questions more discoverable. — bli, Jul 04 '19 at 15:49

Hamid G · Answer 1 · 2019-09-10T13:22:33.527

I have had similar problem as stated in the question 1. This is all the snakemake provide about using R script into snakemake pipeline

In the R script, an S4 object named snakemake analog to the Python case above is available and allows access to input and output files and other parameters. Here the syntax follows that of S4 classes with attributes that are R lists, e.g. we can access the first input file with snakemake@input[[1]] (note that the first file does not have index 0 here, because R starts counting from 1). Named input and output files can be accessed in the same way, by just providing the name instead of an index, e.g. snakemake@input[["myfile"]].

I followed the instruction and make the first line of my R script in this way:

avinp = read.table(snakemake@input[["avi"]], header= FALSE, sep = "\t")
anno = read.table(snakemake@input[["anno"]], header = TRUE, sep= "\t")

avi and anno are the snakemake rule inputs. You can do the same for your output/outputs.

Snakemake for R pipeline -- SETUP

1 Answers1

Linked