1

I have one input file that each line corresponds to one sequence which I need to do multiple checks on each of these sequences (doing it already with a python script, multiple functions). Some of these checks (functions) are not dependent to each other and can run concurrently. So I though to use Snakemake.

The problem is, most examples use so many input files and I only have one file but need to run different shells on each line of the file. Any ideas/examples someone came up with?

My second question is, some of the functions in my python script, don't print out files, but just return something. While I've seen most snakelike examples have an output (which is a file). How can I deal with those functions in Snakemake workflow? I mean how can I pass arguments between different functions/rule/... etc? I hope it's clear what I am asking. Thanks

I did go through the tutorials and some examples online

My python script looks like:

def function1(arg1, arg2): ... return List

def function2(arg1, arg2): .... [write a file]

def function3(arg1, arg2): ... print('blah blah')

def main(): function1(A, B) function2(A, B) function3(A, B)

if name== main: main()

I have no error messages. Yet don't know how to convert my script with so many functions to Snakemake workflow.

Sally
  • 21
  • 1
  • 4

1 Answers1

1

You may be able to define separate snakemake rules for your functions, provided you come up with a system that creates files (it could simply be to print the result of your function to a file). Indeed, snakemake decides which rule to run based on what file it has to generate.

With this, rules that do not depend on one another will be able to run in parallel.

from contextlib import redirect_stdout

def function1(arg1, arg2):
    # ...

def function2(arg1, arg2):
    # ...

def function3(arg1, arg2):
    # ...

A = ...
B = ...

rule all:
    input:
        "function1_result.txt",
        "function2_result.txt"
        "function3_result.txt"

rule run_function1:
    output:
        "function1_result.txt",
    run:
        l = function1(A, B)
        with open(input[0]) as fh:
            print(*l, sep="\n", file=fh)

rule run_function2:
    output:
        "function2_result.txt",
    run:
        # Assuming this writes "function2_result.txt":
        function2(A, B)

rule run_function3:
    output:
        "function3_result.txt",
    run:
        with open(input[0]) as fh:
            # see https://stackoverflow.com/a/55833804/1878788
            with redirect_stdout(fh):
                function3(A, B)

Note that this will not process in parallel the lines of your input file.

bli
  • 7,549
  • 7
  • 48
  • 94
  • Thanks for your suggestion. I ended up doing something similar. Although with the situation there will be too much IO, yet it gets the job done. Thank you – Sally Jul 18 '19 at 18:00