I am looking for some basic information about using Snakemake
handle my R pipeline. It is my understanding that the two most common ways of doing this are by using the script
flag and passing an R
script, or using shell
by passing to Rscript
. If I want to use either of these methods, what should my R
script look like? How does the R
script know to look for and either call load
on an RData
object or read.table
to read a csv file if the name is provided for input
.
The other question is in regards to cluster submission/multi-threading. Snakemake
will supposedly submit the task onto multiple nodes and use multiple cores automatically, without code modification. So would it be better to have one rule using cluster
and/or cores
that calls an R
script to execute the entire pipeline OR break the pipeline into multiple rules/steps and use cluster
and/or cores
on that one rule?
Final question is what if my R
code uses parallel/multi-threading packages like mclapply
within the code itself? How does that affect Snakemake
and its parameters?
Was unable to find answers to these questions online so any info will be appreciated.