2

I have the below snakefile with checkpoints. I am trying to run this for 2 samples (defined as RUNS). However, everytime I try I'm getting an additional variable included. Any thoughts on how to resolve this? Thank you..

import os
from tempfile import TemporaryDirectory

configfile: "config/CONFIG.yaml"
DATA_DIR = config["data_dir"]
RESULTS_DIR = config["results_dir"]
DB_DIR=config["db_dir"]
RUNS=["S1_select", "S3_select"]
BARCODES=config["no_barcode"]


rule all:
    input: expand(os.path.join(RESULTS_DIR, "basecalled/{run}/{barcode}.fastq.gz"), run=RUNS, barcode=BARCODES)

checkpoint guppy_gpu_basecall:
    input: os.path.join(DATA_DIR, "multifast5/{run}")
    output: directory(os.path.join(RESULTS_DIR, "basecalled/{run}"))    #folder with many files
    log: os.path.join(RESULTS_DIR, "basecalled/{run}/basecalling")
    threads: config["guppy_gpu"]["cpu_threads"]
    shell:
        """
        run_guppy
        """

rule intermediate_basecalling:
    input: os.path.join(RESULTS_DIR, "basecalled/{run}/{i}.fastq.gz")
    output: os.path.join(RESULTS_DIR, "basecalled/{run}/no_nobarcode/{i}.fastq.gz")
    log: os.path.join(RESULTS_DIR, "basecalled/{run}/no_barcode_{i}")
    shell:
        """
        (date &&\
        ln -s {input} {output}  &&\
        date) 2> >(tee {log}.stderr) > >(tee {log}.stdout)
        """

def aggregate_dummy_basecalling(wildcards):
    checkpoint_output = checkpoints.guppy_gpu_basecall.get(**wildcards).output[0]
    return expand(os.path.join(RESULTS_DIR, "basecalled/{run}/no_nobarcode/{id}.fastq.gz"),
        run=wildcards.run,
        i=glob_wildcards(os.path.join(checkpoint_output, "{i}.fastq.gz")).i)

rule merge_individual_fastq_per_barcode:
    input: aggregate_dummy_basecalling
    output: os.path.join(RESULTS_DIR, "basecalled/{run}/{barcode}/{barcode}.fastq.gz")
    shell:
        """
        date
        cat $(find $(dirname {output}) -name "*.fastq.gz" | sort) > {output}
        touch {output}
        date
        """

I'm getting the following error:

Missing input files for rule guppy_gpu_basecall:
data/multifast5/S1_select/no_barcode.fastq.gz

Thank you for your pointers!

Susheel Busi
  • 163
  • 8
  • 1
    It sometimes happen to me that the wildcards are not correctly delimited, and this is usually resolved by setting wildcard contraints. But I'm not sure this is your case here... – bli May 08 '20 at 08:54
  • 1
    yeah, I tried wildcard constraints, but then downstream it gave me a `TypeError: 'Checkpoint' object is not callable Wildcards: run=S1_SizeSelected barcode=no_barcode` – Susheel Busi May 08 '20 at 11:16

0 Answers0