1

I have a list of input files that are in different subfolders and each folder have different number of files, with two wildcards SAMPLE and id. For the output, these names will also be present:

SAMPLE=set(["x","y","z"])

with open(config["path"]+"barcodes.txt") as f: id = [line.rstrip() for line in f]

rule all:
    input:
        expand(config["path"]+ "{sample}/remap/filtered.{sample}.R1.clean.id_{ID}.fq.bam", sample=SAMPLE, ID=id, allow_missing=True)



rule map_again:
    output:
        config["path"]+ "{sample}/remap/filtered.{sample}.R1.clean.id_{ID}.fq.bam"
    input:
        expand(config["path"]+ "{{sample}}/map/filtered.{{sample}}.R1.clean.id_{ID}.fq.gz", sample=SAMPLE, allow_missing=True)
    shell:
        "squire Map -1 {input} -r 150 -p 10 "

However, I still got warnings from Snakemake that certain combination of the wildcards don't exist, although I hoped it to ignore these ones...

How could I correct this?

Thank you very much!

SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
P.Yuan
  • 23
  • 2
  • Remove the double braces around `{sample}` in your expand input. That is the wildcard you want to evaluate so you don't want to escape it. – Troy Comi Jul 03 '22 at 21:09
  • Thank you! I removed the extra braces, snakemake started to build DAG but now it failed because "Missing input files for rule all"... – P.Yuan Jul 03 '22 at 21:57
  • *although I hoped it to ignore these ones*: Keep in mind that `expand` is just a convenience function that returns a list of strings from combinations of wildcards. If you need some *ad hoc* combinations that `expand` cannot easily handle, just prepare that list yourself using standard python code. – dariober Jul 04 '22 at 08:16

1 Answers1

1

The current version of rule all contains redundant kwarg allow_missing:

rule all:
    input:
        expand(config["path"]+ "{sample}/remap/filtered.{sample}.R1.clean.id_{ID}.fq.bam", sample=SAMPLE, ID=id)

This is because allow_missing is just a convenience kwarg that allows providing a partial list of wildcards to the expansion. This means that the result of the expansion with this kwarg will contains the missing wildcards. Example borrowed from this answer:

expand("text_{letter}_{num}.txt", num=[1, 2], allow_missing=True)
# ["text_{letter}_1.txt", "text_{letter}_2.txt"]

From the statement of the question, it seems that you would like to ignore missing combinations of files. One way to achieve this is to define a custom function and provide it as input. For example:

def find_available_files(wildcards):
   from glob import glob
   path = config["path"]+ "{sample}/map/filtered.{sample}.R1.clean.id_{ID}.fq.gz"
   files = glob(path.format(sample=wildcards.sample, ID="*")
   return files

rule map_again:
    output:
        config["path"]+ "{sample}/remap/filtered.{sample}.R1.clean.id_{ID}.fq.bam"
    input:
        find_available_files
    shell:
        "squire Map -1 {input} -r 150 -p 10 "
SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46