Hi I´m new in Snakemake and have a question. I want to run a tool to multiple data sets. One data set represents one tissue and for each tissue exists fastq files, which are stored in the respective tissue directory. The rough command for the tools is:
python TEcount.py -rosette rosettefile -TE te_references -count result/tissue/output.csv -RNA <LIST OF FASTQ FILE FOR THE RESPECTIVE SAMPLE>
The tissues shall be the wildcards. How can I do this? Below I have a first try that did not work.
import os
#collect data sets
SAMPLES=os.listdir("data/rnaseq/")
rule all:
input:
expand("results/{sample}/TEtools.{sample}.output.csv", sample=SAMPLES)
rule run_TEtools:
input:
TEcount='scripts/TEtools/TEcount.py',
rosette='data/prepared_data/rosette/rosette',
te_references='data/prepared_data/references/all_TE_instances.fa'
params:
#collect the fastq files in the tissue directory
fastq_files = os.listdir("data/rnaseq/{sample}")
output:
'results/{sample}/TEtools.{sample}.output.csv'
shell:
'python {input.TEcount} -rosette {input.rosette} -TE
{input.te_references} -count {output} -RNA {params.fastq_files}'
In the rule run_TEtools it does not know what the {sample} is.