2

I am developing an ATACseq pipeline using Genrich to run with Snakemake.

The fact is that Genrich allows to call peaks from more than one replicate in the same step, avoiding additional steps (i.e. IDR).

In Snakemake, I have found the way to return all the samples I want (i.e. replicates from one condition) at the same time, but Genrich asks for comma-separated files as input or space-separated files if each file is quoted.

Normally, the input return a list of space-separated files (i.e. file1 file2 file3), and since I don't know how I can make it return comma-separated files, I tried to quote them.

In theory, after Snakemake version 5.8.0, you can refer to the input as {input:q} in the rule's shell command to return the quoted input, as said here.

However, in my case, the returned input is not quoted.

I have created a test rule to see how the input is returned:

rule genrich_merge_test:
    input:
        lambda w: expand("{condition}.sorted.bam", condition = SAMPLES.loc[SAMPLES["CONDITION"] == w.condition].NAME),
    output:
        "{condition}_peaks.narrowPeak",
    shell:
        """
        echo {input:q} > {output}     
        """

And the returned input, which is stored in the output file is:

rep1.sorted.bam rep2.sorted.bam

Does someone know how to solve this and return the quoted input or return a list of comma-separated files instead of space-separated files?

Thank you.

2 Answers2

1

I was thinking echo and the shell may be stripping quotes before piping to output, but checking with snakemake -p to see the command being executed shows they aren't there. It seems like quotes only show up with individual filenames when spaces are present.

Dariober's answer should work to quote the list, but for completeness, if you want a comma-separated list of files, use a lambda function in a params directive:

rule genrich_merge_test:
    input:
        lambda w: expand("{condition}.sorted.bam", 
                         condition=SAMPLES.loc[SAMPLES["CONDITION"] == w.condition].NAME),
    params:
        files=lambda wildcards, input: ','.join(input)
    output:
        "{condition}_peaks.narrowPeak",
    shell:
        """
        echo {params.files} > {output}     
        """

EDIT

Here is a toy example demonstrating the use of params with input:

# snakefile
inputs = expand('{wc}.out', wc=range(4))

rule all:
    input: "test_peaks.narrowPeak"

rule genrich:
    input:
        inputs
    params:
        files=lambda wildcards, input: ','.join(input)
    output:
        "test_peaks.narrowPeak",
    shell:
        """
        echo {params.files} > {output}     
        """

rule generator:
    output: touch('{file}.out')
$ snakemake -np
...
rule genrich:
    input: 0.out, 1.out, 2.out, 3.out
    output: test_peaks.narrowPeak
    jobid: 1


        echo 0.out,1.out,2.out,3.out > test_peaks.narrowPeak 
...

Also as indicated here

Note that in contrast to the input directive, the params directive can optionally take more arguments than only wildcards, namely input, output, threads, and resources.

Troy Comi
  • 1,579
  • 3
  • 12
  • Hi- I think `','.join(input)` under params directive doesn't work. The params directive has access to wildcards but not to the input/output lists. – dariober Oct 27 '20 at 13:50
0

Assuming your input filenames do not contain spaces (and if they do I strongly encourage avoiding them), you can simply put the list of files in quotes, you don't need to quote each file in the list:

rule genrich:
    input:
        t= ['a.bam', 'b.bam'],
    ...
    shell:
        r"""
        Genrich -t '{input.t}' ...
        """

(Note single quotes around '{input.t}')

dariober
  • 8,240
  • 3
  • 30
  • 47