How can I have parallel Snakemake jobs fit in limited memory?

Question

The Snakemake's scheduler ignores my mem_mb declaration and executes in parallel jobs which summed requirements exceed the available memory (e.g. three jobs with mem_mb=53000 in a 128 GB system). Moreover, it runs even jobs which declared requirements (over 1TB when I run snakemake -T10) cannot be met in my system even if run serially. Snakemake also seems to keep a job running even if it allocates much more memory than it declared.

What I have in mind is to tell Snakemake to expect a job allocating up to certain amount of memory, plan the workflow accordingly and enforce the declared constraints on memory consumption. Is there any way to do so with Snakemake without resorting to serial execution?

I have a workflow with a lot of calls to a rule which may be either quite memory-light or memory-heavy. As I want to benefit from parallel execution, I have declared that the job requires 1000 MiB in first attempt to run, and more in subsequent attempts. The rule looks like this:

def get_mem_mb(wildcards, attempt):
    return 1_000 if attempt == 1 else (51_000 + 1_000 * 2 ** (attempt - 1))

rule Rulename:
    input:
        "{CONFIG}.ini"
    output:
        "{CONFIG}.h5"
    resources:
        mem_mb=get_mem_mb
    shell:
        "python script.py -o {output} -i {input}"

This is not a duplicate of Snakemake memory limiting, as the only answer there is incomplete (it does not cover the "memory limiting" part). At the moment, it is my question which has the complete (but split) answer:

Not sure I understand. Do you know how much memory a rule uses before it gets run? If not, how can snakemake know beforehand? — Maarten-vd-Sande, Aug 10 '21 at 09:02
@Maarten-vd-Sande As I understand _Snakemake_ may use trial and error mechanism. In my example it tries to run job with `mem_mb=1000` in first attempt, `mem_mb=52_000` in second attempt and so on. — abukaj, Aug 10 '21 at 10:06

score 1 · Accepted Answer · edited Sep 02 '22 at 10:05

1

I'm basing this on a previous answer of mine, but I am not sure if I understand the question correctly. Perhaps you can run snakemake with e.g. --restart-times 10 and dynamic memory constraints:

rule all:
    input:
        "a"

def get_mem_mb(wildcards, attempt):
    """
    First attempt uses 10 MB, second attempt uses 100MB, third 1GB,
    etc etc
    """
    return 10**attempt

rule:
    output:
        "a"
    threads: 1
    resources:
        mem_mb = get_mem_mb
    shell:
        """
        ulimit -v $(({resources.mem_mb} * 1024))
        python3 -c 'import numpy; x=numpy.ones(1_000_000_000)'
        touch a
        """

Note that this will only work on linux machines. What this approach does it allocates a certain amount of memory for a rule, 10MB on the first try. If the rule tries to allocate more than that, the rule crashes, and it gets restarted with 100MB of memory. If that fails 1GB, etc. etc. You probably want to change how the get_mem_mb rule scales into e.g. 2**attempt.

edited Sep 02 '22 at 10:05

abukaj

2,582
1
22
45

answered Aug 10 '21 at 08:57

Maarten-vd-Sande

3,413
10
27

That is exactly what I do (`-T 10` is `--restart-times 10`). However `mem_mb` does not seem to allocate memory (Ubuntu 20.4). It rather sets limit on the job and kills it if the limit is exceeded. My problem is that the scheduler does not use that information to avoid resource exhaustion (and runs 3 jobs with `mem_mb=53000` in parallel in a 128 GB system). – abukaj Aug 10 '21 at 10:29
Oh, just now I have noticed the `ulimit` command. It seems that this feature is implemented by _Snakemake_ (v. 6.6.1). – abukaj Aug 10 '21 at 11:40
@abukaj I'm happy you found a solution. I was not aware that memory is now enforced! – Maarten-vd-Sande Aug 10 '21 at 14:03
Sadly I may have to take that (memory enforcing claim) back as I am unable to reproduce the "job killing" on purpose. :/ – abukaj Aug 10 '21 at 14:46
1

@abukaj Well if you are working on a linux-based system then ulimit will still work. Otherwise not sure.. – Maarten-vd-Sande Aug 11 '21 at 07:46
Ok, I take that back. It is something else than _Snakemake_ killing my jobs (probably something related to lack of memory) as one of four jobs has completed while rest has been killed with signal 9. – abukaj Aug 11 '21 at 07:52
I have just discovered that there should be `ulimit -v $(({resources.mem_mb} * 1024))`, as `ulimit -v` takes the limit in KiB. Bash multiplication is necessary as it cannot be solved with `params` (as you did in your previous answer) as they seem not to be reevaluated on restart. :/ – abukaj May 17 '22 at 09:14

score 1 · Answer 2 · answered Aug 10 '21 at 11:20

I have found a hint in a comment to related question: How to set binding memory limits in snakemake

"ressources" in snakemake are [...] an arbitrary value that you assign to the whole snakemake run, and snakemake will not run simultaneously jobs whose total in that ressource goes above the assigned amount for the run

Thus, to prevent Snakemake from running memory-heavy jobs simultaneously, I need to tell it what the limit is (e.g. --res mem_mb=100000).

How can I have parallel Snakemake jobs fit in limited memory?

2 Answers2