Snakemake: How to dynamically set memory resource based on input file size

Question

I'm trying to base my cluster memory allocation for a given rule on the file size of an input file. Is this possible in snakemake and if so how?

So far I have tried specifying it in the resource: section like so:

rule compute2:
    input: "input1.txt"
    output: "input2.txt"
    resources:
        mem_mb=lambda wildcards, input, attempt: int(os.path.getsize(str(input))/(1024*1024))
    shell: "touch input2.txt"

But it seems snakemake attempts to calculate this upfront before the file gets created as I'm getting this error:

InputFunctionException in line 35 of test_snakemake/Snakefile:
FileNotFoundError: [Errno 2] No such file or directory: 'input1.txt'

Im running my snakemake with the following command:

snakemake --verbose -j 10 --cluster-config cluster.json --cluster "sbatch -n {cluster.n} -t {cluster.time} --mem {resources.mem_mb}"

score 2 · Answer 1 · answered Jan 21 '21 at 16:13

If you want to do this dynamically per rule as per the question, you can use something along these lines:

resources: mem_mb=lambda wildcards, input, attempt: (input.size//1000000) * attempt * 10

Where input.size//1000000 is used convert the cumulative size of input files in bytes to mb, and the tailing 10 could be any arbitrary number based on the specifics of your shell/script requirements.

score 0 · Answer 2 · answered Aug 27 '20 at 15:26

This is possible using the --default-resources option. As explained in Snakemake's --help information:

In addition to plain integers, python expressions over input size are allowed (e.g. '2*input.size_mb'). When specifying this without any arguments (--default-resources), it defines 'mem_mb=max(2*input.size_mb, 1000)''disk_mb=max(2*input.size_mb, 1000)', i.e., default disk and mem usage is twice the input file size but at least 1GB.

Snakemake: How to dynamically set memory resource based on input file size

2 Answers2