snakemake: is there a way to specify an output directory for each rule?

Question

The scripts I used all put the output files to the current directory where the script was called so in my shell script pipeline I would have cd commands to go to a particular directory to run commands and output files will just be saved in relevant directories. My scripts don't have the parameter for output directory and most of them get the output file names deduced from the input. That has worked pretty well for me.

Now I'm running into this output directory issue consistently as snakemake seem to output the files to the directory where Snakefile is. I could modify all the scripts to take in an additional parameter for output directory but that's gone be a pain for modifying many scripts. I'm wondering if there is any way to specify where the output should go for each specific rule?

score 3 · Answer 1 · answered Dec 05 '16 at 08:20

3

One hack would be to first cd into the output directory, i.e. "cd $(dirname {output[0]})". This needs to be the first in your shell commands.

Having said this, it would be better to change the script to accept an output directory as argument.

Andreas

answered Dec 05 '16 at 08:20

Andreas

716
4
14

couldn't get it to work easily somehow, i guess i will have to modify my scripts.. – olala Dec 05 '16 at 20:52

bli · Answer 2 · 2016-12-08T18:00:43.477

Here is an example rule that I use in one of my snakefiles:

rule link_raw_data:
    output:
        OPJ(data_dir, "{lib}_{rep}.fastq.gz"),
    params:
        directory = data_dir,
        shell_command = lib2data,
    message:
        "Making link to raw data {output}."
    shell:
        """
        (
        cd {params.directory}
        {params.shell_command}
        )
        """

This is probably a bit different from your situation, but hopefully some of the techniques can help. In particular, note the parentheses in the shell section and the usage of a params section to define the output directory.

I'm not sure I'm doing this in the most elegant way, but it works.

data_dir is a parameter read from a config file.

lib2data is a function that generates commands based on the values of some wildcards. I have to ensure that these commands use the correct input file paths of course (and, in this case, also the output in a coherent manner with what the output section says). In your case, it is possible that you will simply have a "hard-coded" shell commands, possibly using some of the rule's input.

More streamlined example

rule run_script1:
    input:
        path/to/initial/input
    output:
        script1_out/output1
    shell:
        """"
        cd script1_out
        script1 {input}
        """"

rule run_script2:
    input:
        script1/output1
    output:
        script2/output2
    shell:
        """
        cd script2_out
        script2 {input}
        """

Starting from these examples, you can use functions of the wildcards in the input or output if necessary.

thanks, i'm wondering what do the parentheses mean in the shell section? — olala, Dec 07 '16 at 19:25
Actually, I realize that in this context, the parentheses are useless because there is no other commands after them. The commands after the closing parenthesis would happen in the working directory as it is before the `cd`. — bli, Dec 07 '16 at 20:48
you mean parentheses group the commands inside into one block and they will be executed together and thus in the params.directory? outside of parentheses, other commands works in the working directory? — olala, Dec 07 '16 at 21:19
Yes, that's how they could be useful, but my example is not relevant in this regard. — bli, Dec 07 '16 at 21:52

score 2 · Answer 3 · answered Dec 06 '16 at 14:31

2

In snakemake documentation:

"All paths in the snakefile are interpreted relative to the directory snakemake is executed in. This behaviour can be overridden by specifying a workdir in the snakefile:"

workdir: "path/to/workdir"

So just put that at the begining of your snakefile and all inputs and outputs will be interpreted relative to this path.

answered Dec 06 '16 at 14:31

Eric C.

3,310
2
22
29

2

right, i understand this but it's not solving the question i'm asking... – olala Dec 07 '16 at 19:18

score 1 · Answer 4 · answered Dec 03 '16 at 19:56

1

You could try to use a configuration file either in YAML or JSON maybe. Then use the directory as a parameter in your expand or in the input/output of your rules.

See the documentation here

answered Dec 03 '16 at 19:56

rioualen

948
8
17

i don't think this will work as i still need to pass the parameter into the script and my script doesn't take that parameter yet – olala Dec 04 '16 at 20:57
You can use the parameter in the `shell` section, as in my answer: http://stackoverflow.com/a/40998525/1878788. – bli Dec 08 '16 at 17:25

snakemake: is there a way to specify an output directory for each rule?

4 Answers4

More streamlined example