3

My question is very similar to this one.

I am writing a snakemake pipeline, and it does a lot pre- and post-alignment quality control. At the end of the pipeline, I run multiQC on those QC results.

Basically, the workflow is: preprocessing -> fastqc -> alignment -> post-alignment QCs such as picard, qualimap, and preseq -> peak calling -> motif analysis -> multiQC.

MultiQC should generate a report on all those outputs as long as multiQC support them.

One way to force multiqc to run at the very end is to include all the output files from the above rules in the input directive of multiqc rule, as below:

rule a:
  input: "a.input"
  output: "a.output"
  
rule b:
  input: "b.input"
  output: "b.output"
  
rule c:
  input: "b.output"
  output: "c.output"
  
rule multiqc:
  input: "a.output", "c.output"
  output: "multiqc.output"

However, I want a more flexible way that doesn't depend on specific upstream output files. In such a way, when I change the pipelines (adding or removing any rules), I don't need to change the dependency for multiqc rule. The input to multiqc should simply be a directory containing all the files that I want multiqc to scan over.

In my situation, how can I force the multiQC rule to execute at the very end of pipeline? Or is there any general way that I can force a certain rule in snakemake to run as the last job? Probably through some configuration on smakemake such that in any situation, no matter how I change the pipeline, this rule will execute at the end. I am not sure whether or not such method exists.

Thanks very much for helping!

luomengt
  • 41
  • 4
  • Welcome on SO. Your question isn't entirely clear to me. 1. What do you mean by: *have three different QC but only one is run*. Why is only one run, by your choice, or is this a problem you're trying to solve? The sentence that starts with *since* does not make sense, please rephrase. Why does the solution in the question linked not work in your case? What's different? When you say optional, how is it decided whether they are run or not? – Cornelius Roemer Jul 02 '21 at 17:20
  • 1
    @CorneliusRoemer Hi Cornelius, I have edited my question and hopefully it make more sense now. Please let me know if it's not clear. – luomengt Jul 02 '21 at 19:19
  • 1
    Yes, now I'm starting to see what you're getting at! So you want to be flexible about the exact QC programs you want to run? There's probably a good Snakemake way of doing this, will think about it. – Cornelius Roemer Jul 02 '21 at 20:20
  • 1
    @CorneliusRoemer yes, or probably some step will be skipped in the pipeline, but multiQC should still work properly. So I was thinking whether or not there might be someway to config the Snakefile or through other ways to make a certain step to be execute lastly. Thanks. – luomengt Jul 02 '21 at 20:33

2 Answers2

1

From your comments I gather that what you really want to do is run a flexibly configured number of QC methods and then summarise them in the end. The summary should only run, once all the QC methods you want to run have completed.

Rather than forcing the MultiQC rule to be executed in the end, manually, you can set up the MultiQC rule in such a way that it automatically gets executed in the end - by requiring the QC method's output as input.

Your goal of flexibly configuring which QC rules to run can be easily achieved by passing the names of the QC rules through a config file, or even easier as a command line argument.

Here is a minimal working example for you to extend:

###Snakefile###

rule end:
    input: 'start.out', 
           expand('opt_{qc}.out',qc=config['qc'])

rule start:
    output: 'start.out'

rule qc_a:
    input: 'start.out'
    output: 'opt_a.out'
    #shell: #whatever qc method a needs here

rule qc_b:
    input: 'start.out'
    output: 'opt_b.out'
    #shell: #whatever qc method b needs here

This is how you configure which QC method to run:

snakemake -npr end --config qc=['b']  #run just method b
snakemake -npr end --config qc=['a','b']  #run method a and b
snakemake -npr end --config qc=[]  #run no QC method
Cornelius Roemer
  • 3,772
  • 1
  • 24
  • 55
  • 2
    I understand what you mean there. But I am seeking for a method that don't require those dependencies (which files are required/which qc methods are used). So probably through some configuration on smakemake such that in any situation, no matter how I change the pipeline, this rule will execute at the end. I am not sure whether or not such method exists. – luomengt Jul 05 '21 at 15:19
1

It seems like onsuccess handler in snakemake is what I am looking for.

luomengt
  • 41
  • 4