I'm starting to experiment with using containers with Snakemake, and I have a question about what needs to be pre-build into the container and what doesn't. For example:
I want to run a python script (stored in workflow_root/scripts/myScript.py, for example) in a container with a pipe in from another program. Do I need to build the python script into the container, declare it as an input file, or is that accessible from within the container (and how do I point to it)? My current rule looks something like:
rule myRule:
params:
sample = get_sample,
basePath = sys.path[0]
input:
in1=get_in1,
in2=get_in2
output:
out1 = "{runPath}/{sample}_read1_dcs.fq.gz",
out2 = "{runPath}/{sample}_read2_dcs.fq.gz"
priority: 50
conda:
"envs/myEnv.yaml"
log:
"{runPath}/logs/{sample}_myRule.log"
shell:
"""
set -e
set -o pipefail
set -x
{{
picard FastqToSam \
F1={input.in1} \
F2={input.in2} \
O=/dev/stdout \
SM={params.sample} \
TMP_DIR=picardTempDir \
SORT_ORDER=unsorted \
| python3 {params.basePath}/scripts/myScript.py \
--input /dev/stdin \
--prefix {wildcards.sample}
}} 2>&1 | tee -a {log}
"""
I want to run bwa, where I have a sizable user-provided reference that I need to use. Can I do this, or would I need to build that reference into the container? (I'd also like to use ensemble-VEP, which has its own sizable reference database to deal with).
I suppose what my question boils down to is: what files / locations are mounted to the container by Snakemake, and where do I find them when I'm writing rules involving shell commands? The documentation doesn't seem to be very clear on this, and it would be nice to be able to figure it out without having to do a bunch of experimentation.