0

Do you know how to run snakemake with specific combination of files? i.e. In this txt files I have list of sequence ID's:

bob.txt 
steve.txt 
john.txt

From these files I want to extract the sequences of the ID's in the above files:

bob.fa
steve.fa
john.fa

So sequence ID's from bob should look for sequences in bob.fa, while john in john.fa and so on.

workdir: "/path/to/dir/"
(SAMPLES,) =glob_wildcards('path/to/dir/{sample}.fa')

rule all:
    input: 
        expand("{sample}.unique.fa", sample=SAMPLES)

rule seqkit:
    input:
        infa ="path/to/dir/{sample}.fa"
        intxt = "path/to/dir/{sample}.txt
    output:
        outfa = "{sample}.unique.fa"
    shell:
        ("/Tools/seqkit grep -f {input.intxt} {input.infa} > {output.outfa}")

So I do not need all combinations, but only specific, like bob.txt and bob.fa, steve.txt and steve.fa. Because my current code will also do bob.txt in steve.fa

user3224522
  • 1,119
  • 8
  • 19
  • 2
    Looks to me like you have the right idea. Please add some info on how your current Snakefile is not doing what you want. – MSR Dec 20 '19 at 16:09
  • I don' want it to run all combinations, because it is time and memory consuming... but only the combinations bob.txt with bob.fa, svete.txt with steve .fa...I have no idea how to do this and if it possible at all... – user3224522 Dec 20 '19 at 17:29
  • 2
    In general, you can use `zip` within `expand` to zip the lists expanded (instead of getting the product). In your case, this should not be necessary. When I remove the typos in your Snakefile and run `snakemake -np`, Snakemake attempts to produce `steve.unique.fa` from `steve.fa` and `steve.txt`, and likewise for `john` and `bob`, as you desire? – MSR Dec 20 '19 at 17:36
  • 1
    Can you show us your output with `-p` flag to print shell commands? As @MSR mentioned, your current code should already produce the output you are expecting. – Manavalan Gajapathy Dec 20 '19 at 19:16
  • sorry guys, you are right, I am so stupid :D. I have an additional question if you could answer I would appreciate, otherwise I can open a new question: if my txt file names were bob_steve.txt, steve_john.txt, john_bob.txt and so on. And I want to decide which txt should look for sequences in which fasta, is it possible, by creating a list or something like this? – user3224522 Dec 21 '19 at 11:02
  • As I said above, you would then need to use `zip` within `expand`. If you have problems with this, feel free to open a separate question. – MSR Dec 21 '19 at 11:29

1 Answers1

1

Comma is missing in rule seqkit input.

rule seqkit:
    input:
        infa ="path/to/dir/{sample}.fa",
        intxt = "path/to/dir/{sample}.txt
Manavalan Gajapathy
  • 3,900
  • 2
  • 20
  • 43