3

I'm trying to make a simple way to create all of the sub-directories needed for the workflow in one rule. However, I'm getting a ChildIOException which makes no sense to me whenever I try to execute a rule that creates all of the required directories at the top of the workflow:

Building DAG of jobs...
ChildIOException:
File/directory is a child to another output:
/scratch/groups/xxx/xxx/neand_sQTL/filtered_vcf
/scratch/groups/xxx/xxx/neand_sQTL/filtered_vcf/merged_filtered_chr1.vcf.gz

Here are the problematic rules:

rule mkdir_vcf:
    output:
        directory("gtex_vcf/"),
        directory("kg_vcf/"),
        directory("merged/"),
        directory("filtered_vcf/"),
        touch(".mkdir.chkpnt")
    shell:
        "mkdir -p {output}"

rule vcf_split1_23:
    input:
        vcf=config["vcf"],
        chk=".mkdir.chkpnt"
    output:
        "gtex_vcf/gtex_chr{i}.vcf"
    threads:
        23
    shell:
        "tabix -h {input.vcf} chr{wildcards.i} > {output}"

I tried using the directory() function to see if that would help with the error and it didn't. Not sure what else to do here. I can't include mkdir in vcf_split1_23 because that's a parallel job and it would be bad form to make a rule that successfully makes a directory once and erroneously 22 times. I for sure want mkdir_vcf to run before the rest of the rules.

CelineDion
  • 906
  • 5
  • 21

1 Answers1

5

I see three options;

  • Just to do mkdir -p in the vcf_split1_23 rule. (this doesn't fail when the dir already exists) .
  • Make the directories with python outside of any rule; e.g. os.mkdir("filtered_vcf").
  • Instead of specifying the directories you want to make as output, specify them as params:
    rule mkdir_vcf:
        output:
            touch(".mkdir.chkpnt")
        params:
            "gtex_vcf/",
            "kg_vcf/",
            "merged/",
            "filtered_vcf/"
        shell:
            "mkdir -p {output} {params}"
Maarten-vd-Sande
  • 3,413
  • 10
  • 27