0

I wish to set a maximum of cpus used by a snakemake pipeline on a SLURM cluster, so the number of jobs sent in parallel (--jobs) is defined by max CPUs limited rather than by --jobs param.

Snakemake version 7.20.0

So I though that snakemake--local-cores 240 --jobs 20 --cluster-config cluster.yaml --latency-wait 60 --cluster 'sbatch -t {cluster.time} --mem={cluster.mem} -c {cluster.cpus} -o {cluster.output} -e {cluster.error}'

would schedule max 20 jobs in parallel (right? because "Use at most N CPU cluster/cloud jobs in parallel" is a bit unclear, but I figured out it means 'jobs' in my case) With at most 240 cpus used on the host machine at the same time,

So if I have 20 jobs with 40 threads (or CPUs) each, I expected only 6 jobs to be scheduled in the same time, because 6x40=240, and I set --local-cores 240 so "use at most 240 cores of the host machine in parallel.

But still 20 jobs are scheduled on 40 cpus each, so 800 cpus are booked while I expected to use max 240 CPUs.

Do I need to set --jobs unlimited so the number of jobs send in parallel is defined by --local-cores ?

1 Answers1

1

My understanding of local cores and jobs is that local cores defines how many localrules' threads you can have running on the main snakemake process. That's independent of the number of cluster jobs which can be actively submitted to the scheduler and is determined by jobs.

As an aside, it shouldn't matter how many jobs you have in your slurm queue as long as you don't go over some maximum (set by jobs). If your admin only allows 240 cores at once the others will sit on the queue until one finishes. Not a huge deal and you may get better priority by submitting those jobs ahead of time instead of throttling with snakemake.

To solve your question though, I would set a separate resource, call it cores, which is equal to the number of threads for each rule. When you invoke snakemake you say how many cores you want to use with the resource option. The downside is the code duplication for every rule you want to limit.

rule my_job:
    threads: 20
    resources:
        cores=lambda wc, threads: threads
    ....

snakemake --resources cores=240

Troy Comi
  • 1,579
  • 3
  • 12
  • Hi Troy, Ok thanks for your answer, but I am not sure to understand the difference between threads and cores in your example. Also I think my problem comes from there is no cores/nodes limit by the admin on the cluster I used, just recommendation, so I wanted snakemake to adapt the umber of job it sents according to these recommendation, something like "send the number of jobs you want but use at max 240 cores on the cluster", I though that was the purpose of --local-cores. But maybe there is a way to limit my account with a slurm profile instead of doing it by snakemake – Dav.Dep Mar 21 '23 at 10:47
  • The threads determine how many cpus are *requested* from slurm, per job. The cores resource limits how many total cores are *active* on the queue or running by snakemake. They are the same number but used differently by snakemake. – Troy Comi Mar 21 '23 at 14:12