5

I used snakemake on LSF cluster before and everything worked just fine. However, recently I migrated to SGE cluster and I am getting a very strange error when I try to run a job with more than one wildcard.

When I try to submit a job based on this rule

rule download_reads :
    threads : 1
    output : "data/{sp}/raw_reads/{accesion}_1.fastq.gz"
    shell : "scripts/download_reads.sh {wildcards.sp} {wildcards.accesion} data/{wildcards.sp}/raw_reads/{wildcards.accesion}"

I get a following error (snakemake_clust.sh details bellow)

./snakemake_clust.sh data/Ecol1/raw_reads/SRA123456_1.fastq.gz                                          
Building DAG of jobs...
Using shell: /bin/bash
Provided cluster nodes: 10
Job counts:
        count   jobs
        1       download_reads
        1

[Thu Jul 30 12:08:57 2020]
rule download_reads:
    output: data/Ecol1/raw_reads/SRA123456_1.fastq.gz
    jobid: 0
    wildcards: sp=Ecol1, accesion=SRA123456

scripts/download_reads.sh Ecol1 SRA123456 data/Ecol1/raw_reads/SRA123456
Unable to run job: ERROR! two files are specified for the same host
ERROR! two files are specified for the same host
Exiting.
Error submitting jobscript (exit code 1):

Shutting down, this might take some time.

When I replace the sp wildcard with a constant, it works as expected:

rule download_reads :
        threads : 1
        output : "data/Ecol1/raw_reads/{accesion}_1.fastq.gz"
        shell : "scripts/download_reads.sh Ecol1 {wildcards.accesion} data/Ecol1/raw_reads/{wildcards.accesion}"

I.e. I get

Submitted job 1 with external jobid 'Your job 50731 ("download_reads") has been submitted'.

I wonder why I might have this problem, I am sure I used exactly the same rule on the LSF-based cluster before without any problem.

some details

The snakemake submitting script looks like this

#!/usr/bin/env bash                                                                                                                                                                
                                                                                                                                                                                   
mkdir -p logs                                                                                                                                                                      
                                                                                                                                                                                   
snakemake $@ -p --jobs 10 --latency-wait 120 --cluster "qsub \                                                                                                                     
    -N {rule} \                                                                                                                                                                    
    -pe smp64 \                                                                                                                                                                    
    {threads} \                                                                                                                                                                    
    -cwd \                                                                                                                                                                         
    -b y \                                                                                                                                                                         
    -o \"logs/{rule}.{wildcards}.out\" \                                                                                                                                           
    -e \"logs/{rule}.{wildcards}.err\""   

-b y makes the command executed as it is, -cwd changes the working directory on the computing node the the working directory from where the job was submitted. Other flags / specifications are clear I hope.

Also, I am aware of --drmaa flag, but I think out cluster is not well configured for that. --cluster was till now a more robust solution.

-- edit 1 --

When I execute exactly the same snakefile locally (on the fronend, without the --cluster flag), the script gets executed as expected. It seems to be a problem of interaction of snakemake and the scheduler.

Kamil S Jaron
  • 494
  • 10
  • 23

1 Answers1

3
-o \"logs/{rule}.{wildcards}.out\" \                                                                                                                                           
-e \"logs/{rule}.{wildcards}.err\""   

This is a random guess... More than one wildcards are concatenated by space before replacing them into logs/{rule}.{wildcards}.err. So despite you use double quotes, SGE treats the resulting string as two files and throws the error. What if you use single quotes instead? Like:

-o 'logs/{rule}.{wildcards}.out' \                                                                                                                                           
-e 'logs/{rule}.{wildcards}.err'

Alternatively, you could concatenate the wildcards in the rule and use the result on the command line. E.g.:

rule one:
    params:
        wc= lambda wc: '_'.join(wc)
    output: ...

Then use:

-o 'logs/{rule}.{params.wc}.out' \                                                                                                                                           
-e 'logs/{rule}.{params.wc}.err'

(This second solution, if it works, kind of sucks though)

dariober
  • 8,240
  • 3
  • 30
  • 47
  • Wow, thanks. I did not realize that the problem will be in my log specifications. The single quote solution changes nothing and indeed the second solution would generate sooo much boiler plate in my Snakemake file, I refuse to do that. Fortunatelly, just writing `=o logs -e logs` makes really nice output by default (jobname == rulename; + job ID, which means that at least every log will be unique, I take it) – Kamil S Jaron Aug 06 '20 at 10:54
  • I proposed an edit according to the solution I used in the end. – Kamil S Jaron Aug 06 '20 at 10:56
  • @KamilSJaron If you are happy with your solution maybe better if you post it as it's own answer (not that I mind the edit). – dariober Aug 06 '20 at 11:01
  • I am happy to accept yours, you figured out where is the problem in the first place :-) – Kamil S Jaron Aug 06 '20 at 11:17
  • @KamilSJaron What happens with you add an additional pair of quotes like `'\"logs/{rule}.{wildcards}.out\"'` (maybe you need to play with the escapes). I wonder if the first pair of quotes gets consumed before it reaches qsub. – dariober Aug 07 '20 at 08:00