3

I want to include RSeQC results using multiQC in a snakemake workflows. I have the issue that one of the RSeQC tool only reports a .r and a .pdf while it seems that multiQC requires a .txt input to create a plot.

Has anyone working code for snakemake that recover info from RSeQC into a multiQC rule.

As this is a combination of three tools, it is difficult to get support.


My code here of which only the geneBodyCoverage.txt RSeQC output is used (not the two .r outputs and especially junctionSaturation_plot.r of which there is nothing else than the .r and the png picture)

rule multiqc_global:
"""
Aggregate all MultiQC reports
"""
  input:
    expand("intermediate/{smp}_fastqc.zip", smp=SAMPLES),
    expand("intermediate/merged_{smp}_fastqc.zip", smp=SAMPLES),
    expand("logs/star/{smp}_Log.final.out", smp=SAMPLES),
    expand("intermediate/{smp}.geneBodyCoverage.txt", smp=SAMPLES),
    expand("intermediate/{smp}.geneBodyCoverage.r", smp=SAMPLES),
    expand("intermediate/{smp}.junctionSaturation_plot.r", smp=SAMPLES),
  output:
    html = "results/global_multiqc.html",
    stats = "intermediate/global_multiqc_general_stats.txt"
  log:
    "logs/multiqc/global_multiqc.log"
  version: "1.0"
  shadow: "minimal"
  shell: 
    """
    # Run multiQC and keep the html report
    multiqc -n multiqc.html {input} 2> {log}
    mv multiqc.html {output.html}
    mv multiqc_data/multiqc_general_stats.txt {output.stats}
    """
merv
  • 67,214
  • 13
  • 180
  • 245
splaisan
  • 845
  • 6
  • 22
  • Hi, if you want help with a combination of products it would be best to tag the question with them all. I see that there are no specific StackOverflow tags for the other two products, but please add existing tag(s) best describe them. Thanks. – MandyShaw Nov 01 '18 at 18:29
  • Are you running at least one of RSeQC [modules supported by multiqc](https://multiqc.info/docs/#rseqc)? It doesn't look like `*.txt` file always; file used [depends on the module](https://github.com/ewels/MultiQC/blob/9e58729dcc28ebef974561fed0aee026a0f3b3a4/multiqc/utils/search_patterns.yaml#L419). – Manavalan Gajapathy Nov 01 '18 at 19:19
  • Hi @JeeYem; I tried to add th e.r file in the inputs too but only the QC from the txt file was generated. – splaisan Nov 02 '18 at 20:00
  • Do you see any RSeQC results in your multiqc output? Or, do you not see all samples in your output? If latter, you could try `--dirs-depth` if the problem was due to same sample name. Also, try `--verbose` to see what multiqc is doing, which could tell you where the problem is. – Manavalan Gajapathy Nov 03 '18 at 04:30
  • Note - MultiQC uses different files from rseqc for different result types. For some, it uses the R scripts (as this is the only place to get the numbers from, aside from PDF). See the search patterns used here: https://github.com/ewels/MultiQC/blob/feb8c96b6320b23652307e5ffdc2bb994be112c1/multiqc/utils/search_patterns.yaml#L448-L470 – ewels Mar 20 '19 at 22:07

1 Answers1

2

This is sort of anecdotal, since as @JeeYem pointed out in a comment, it could depend on what analysis you're running with RSeQC. Here's how I use the read_distribution.py analysis in snakemake, which generates a compatible file that MultiQC recognizes.

rule read_distribution:
    input:
        bam = "data/bam/{srr}.bam",
        bed = config["gencodeBED"]
    output:
        "qc/aligned/{srr}.read_distribution.txt"
    shell:
        """
        read_distribution.py -i {input.bam} -r {input.bed} &> {output}
        """

Basically, just redirect the stdout and err streams to a file. Hopefully, it's a similar thing for the other RSeQC scripts.

merv
  • 67,214
  • 13
  • 180
  • 245
  • Hi Mandy. Tried but my low credits are not allowing me to create tags. If you could create the tags i would apreciate – splaisan Nov 02 '18 at 19:49
  • added my code above which only outputs one item out of the three coming from RSeQC – splaisan Nov 02 '18 at 20:07