0

I am trying to use file that will be written during the run as an input to another rule, but it always give me error FileNotFoundError: [Errno 2] No such file or directory:

Is there a way to fix it or other implementation to have the same logic.

def vc_list(wildcards):
    my_list = []
    with open(wildcards.mydir+"/file_B.txt", 'r') as data_in:
        for line in data_in:
            my_list.append(line.strip())
    return(my_list)

# rule A will process file_A.txt and give me file_B.txt
rule A:
    input: "{mydir}/file_A.txt"
    output: "{mydir}/file_B.txt"
    shell: "seq 1 5 > {output}"  # assume that `seq 1 5` is the output from proicessing the file

rule B:
    input: "{vlaue}"
    output: "{vlaue}.vc"
    shell: "pythoncode.py {input} {output}"

# rule C will process file_B.txt to give me list of values that will be used to expanded the input, then will use rile B to produce it
rule C:
    input:
        processed_file = rules.A.output, #"{mydir}/file_B.txt", 
        my_list = lambda wildcards: expand("{mydir}/{value}.vc", mydir=wildcards.mydir, value=vc_list(wildcards))
    output: "{mydir}/done.txt"
    shell: "touch {output}"
#I always have the error that "{mydir}/file_B.txt" does not exist

The error now:

test_loop.snakefile: FileNotFoundError: [Errno 2] No such file or directory: 'read_file/file_B.txt' Wildcards: mydir=read_file

Thanks,

Medhat
  • 1,622
  • 16
  • 31
  • Does `rule C` start before `rule A` (and `rule B`) is finished? could add an `input` to `rule C` based on `rules.A.output` etc. Otherwise all the rules may try and run simultaneously – Chris_Rands Jul 25 '19 at 09:05
  • The run will be something like `sankemake mydir/done.txt` even though when I put `{mydir}/file_B.txt` as required input for rule `C` I still have the same error. I updated the code example above to give you a real example. – Medhat Jul 25 '19 at 14:31

2 Answers2

0

The answer to my question is to use checkpoint as dynamic will be deprecated. Here is how the logic should be changed:

    rule:
        input: 'done.txt'

    checkpoint A:
        output: 'B.txt'
        shell: 'seq 1 2 > {output}'


    rule N:
        input: "genome.fa"
        output: '{num}.bam'
        shell: "touch {output}"

    rule B:
        input: '{num}.bam'
        output: '{num}.vc'
        shell: "touch {output}"


    def aggregate_input(wildcards):
        with open(checkpoints.A.get(**wildcards).output[0], 'r') as f:
            return [num.rstrip() + '.vc' for num in f]

    rule C:
        input: aggregate_input
        output: touch('done.txt')

Credit goes to Eric Lim

Medhat
  • 1,622
  • 16
  • 31
-1

Your script fails even before the workflow starts, on the phase of the pipeline construction.

So, there is nothing surprising regarding the rules A and B: Snakemake reads their input and output sections and finds no problem with them. Then it starts reading the rule C where the input section calls the vc_list() function which in turn tries to read the file 'read_file/file_B.txt' even before the workflow has started! For sure it doesn't find the file and produces the error.

As for what to do, you need to clarify the task first. Most probable you are trying to use dynamic information in the input rule. In this case you need to use dynamic files or checkpoints.

Dmitry Kuzminov
  • 6,180
  • 6
  • 18
  • 40
  • Thanks, I know why it fails, that is why I was asking clearly *other implementation to have the same logic.* , anyway, the answer is to change rule A to checkpoint` then use a function as input to rule C that will take wildcards and get the input from checkpoint A which is the file location, loop through it and use expand to return a list that will be the input for C. anyway the question was clear and people understood it, here is the credit for the answer. https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/snakemake/7J8QDqBtdA0/4utlFuzQDgAJ – Medhat Jul 28 '19 at 01:57