I'm looking for a way to be able to specify from the command line:
- the total number of threads to be used at the same time (even if by multiple jobs)
- the maximal number of jobs to run in parallel (which I currently successfully get using
--jobs
so all is good here). - If the maximum number of threads to be used it higher than the threads specified for a particular rule, the use the minimum between the two for this specific rule.
My rules look like this:
rule a:
input: "{sample}.in"
output: "{sample}.out"
threads: 10
shell: "some-program --threads {threads}"
rule b:
input: expand("{sample}.out", sample=SAMPLES)
output: touch("done.done")
threads: 1
shell: "do something"
When I use --cluster
to submit my jobs to the cluster and I use a wrapper for qsub, my command line looks like this:
snakemake --cluster "qsub-wrapper --threads {threads}" --jobs N
and hence I specify the number of threads to allocate per job. The --jobs
parameter then is interpreted as the number of jobs to submit in parallel to the cluster, but doesn't limit the overall number of threads that will be used.
So for example if I use --jobs 2
, then 2 instances of rule a
will run in parallel occupying a total of 20 threads.
The solution that I found was to use the --resources
, where I added to each rule:
resources: nodes=NUMBER_OF_THREADS
NUMBER_OF_THREADS
is simply whatever I defined for the threads, so the example from above would look like this:
rule a:
input: "{sample}.in"
output: "{sample}.out"
threads: 10
resources: nodes=10
shell: "some-program --threads {threads}"
rule b:
input: expand("{sample}.out", sample=SAMPLES)
output: touch("done.done")
threads: 1
resources: nodes=1
shell: "do something"
And now I run:
snakemake --cluster "qsub-wrapper --threads {threads}" --jobs N --resources nodes=10
Now, even though 2 jobs could be submitted according to --jobs
, but only one would be submitted due to the resources.
Is there a better way to do this?
Also, is there a way for me to access the resources variable from within the snakefile? The reason I want to do that is that I now face a different problem: if the resources were lower than the threads for a rule, then that rule is never submitted to the queue, so what I would like to do is something like this:
rule a:
input: "{sample}.in"
output: "{sample}.out"
threads: min(10, command_line_specified_resources.nodes)
resources: min(10, command_line_specified_resources.nodes)
shell: "some-program --threads {threads}"
But I haven't found a way to access the command line specified resources (I tried seeing if the workflow
object would have that, but I didn't see anything).
Thank you for your help!