3

I'm creating a simple snakemake pipeline that contains global variables in the Snakefile. What's the recommended way to use these global variables in the Python scripts called in my rules?

I'm currently using argparse command line arguments as described here (Snakemake: pass command line arguments to scripts) but am wondering if there's a better way.

merv
  • 67,214
  • 13
  • 180
  • 245
amm
  • 81
  • 1
  • 3
  • Have you read [the docs on external scripts and accessing variables](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#python)? – merv Aug 12 '21 at 22:35
  • @merv Thanks for your response. I have seen those but I think it's not quite what I'm looking for. I'm specifically trying to use variables defined at the top of my Snakefile (outside of rules and outside of the config file) that are not input, output, or config values. Perhaps argparse is the only way. – amm Aug 13 '21 at 17:49

1 Answers1

6

Passing Variables

If the variable is specified in the Snakefile, then it could be passed via params. For example,

Snakefile

# global variable to use
FOO = 100

rule test:
  input: "a.in"
  output: "a.out"
  params:
    foo=FOO  # pass the variable value as 'foo'
  script: "scripts/test.py"

scripts/test.py

#!/usr/bin/env python

# access the variable through the `snakemake` object
print(snakemake.params.foo)

See Snakemake documentation on external Python scripts.


Additional Comments

Note, that generally I find it better practice to place a variable like the above example in a config.yaml instead. That helps centralize adjustable parameters, providing a single point of configuration for reuse. Despite the availability of snakemake.config in external scripts, I still prefer to explicitly pass configuration values as params, so as to make it clear which rules depend on what configuration values.

Example

config.yaml

foo: 100

Snakefile

configfile: "config.yaml"

rule test:
  input: "a.in"
  output: "a.out"
  params:
    foo=config["foo"] 
  script: "scripts/test.py"

scripts/test.py

#!/usr/bin/env python

# access the variable through the `snakemake` object
print(snakemake.params.foo)

Overriding Configuration Parameters

If the value is provided in the config.yaml, one can also then (optionally) override it at the CLI:

snakemake --config foo=150

See documentation on configuration parameters.

merv
  • 67,214
  • 13
  • 180
  • 245
  • Thank you, this makes perfect sense! Is there a way to make these config values optional in the command line? – amm Aug 15 '21 at 02:57
  • @amm I added a note to the answer re: CLI overrides – merv Aug 15 '21 at 05:21
  • Thanks for the further explanation. I find that I get a KeyError when I have a default value in the `config.yaml` (i.e. `comp_het="None"`) but I don't specify it in the command line (snakemake -c1. --config id=1 genome_build="hg19"). I'd like it to be that config values are optional in the command line so that I don't have to include `comp_het` when I'm not changing the default value. – amm Aug 17 '21 at 17:58
  • @amm are you using YAML syntax? I.e., `comp_het="None"` is invalid - should be `comp_het: "None"` in the YAML. – merv Aug 17 '21 at 18:06
  • Sorry for the lack of clarity. In the YAML file, I have `comp_het: "None"` and in the command line, it would be `--config comp_het="None"`. – amm Aug 17 '21 at 18:43
  • Still, it sounds like your `config.yaml` is not being loaded correctly. Run it through a linter to make sure the YAML is valid (note that YAML has some whitespace sensitivity). Also, you added the `configfile: "config.yaml"` at the top of the Snakefile? Not sure what else would lead it to fail to load the config file values. – merv Aug 18 '21 at 17:52
  • 1
    Aha! I had ```config: "config.yaml"``` at the top of my Snakefile, rather than ```configfile: "config.yaml"```. Thanks so much. – amm Aug 20 '21 at 02:35