1

Hello I just want to apply a loop to a set of files, but instead of doing it to all my files I want to make the loop just only to certain files in a directory

Here is the command that I use, is a bowtie2 based alignment of genomic sequences:

 for i in *1.fastq.gz
    do 
    base=$(basename $i "_1.fastq.gz")
    bowtie2 -p 8 -x /mnt/path/contigs -1 ${base}_1.fastq.gz -2 ${base}_2.fastq.gz | samtools view -b -o ${base}.bam -
    done

so with this command, bowtie2 makes alignment with all my files, but given the fact that on this folder there are files whose bowtie2 analysis is completed I don't want bowtie2 to make analysis over these files again, so, is there any subcommand that I can add to this loop for avoiding analysis of certain files?

Valentin
  • 399
  • 2
  • 10
  • For the future, fix your process so the processed files get moved to a different directory. You don't specify anything about the files you want to skip. Could you rename so as `*1a.fastq.gq` (for instance), then they will not be processed. Else you'll have to test the name inside the loop, but that will be tricky. Best to move or rename files in bulk so they don't match `*1.fastq.gz`. (If you can make a file with a list of that should be excluded, then update your question with a small sample and we can likely help). Good luck. – shellter Nov 02 '20 at 04:11
  • How many fastq files do you want to run? If there aren't many, you can specify each of them for your for loop, e.g. `for i in R001.fastq.gz R002.fastq.gz R003.fastq.gz` etc – jared_mamrot Nov 02 '20 at 04:12

1 Answers1

0

Create 2 files, each with 1 basename per line: (1) your inputs, here read 1 fastq base file names, and (2) your existing outputs, here bam base file names. Sort the files and use comm -23 file1 file2 > file3 to select only the basenames that have not been mapped yet. Then loop over those, saved in file3.

Quick and dirty solution (assuming the filenames do not have whitespace):

ls -1 *_1.fastq.gz | perl -pe 's/_1.fastq.gz//' | sort > in.basenames.txt
ls -1 *.bam | perl -pe 's/.bam//' | sort > out.basenames.txt
comm -23 in.basenames.txt out.basenames.txt > todo.in.basenames.txt

while read -r base_name ; do
    bowtie2 -1 ${base_name}_1.fastq.gz -2 ${base_name}_2.fastq.gz ...
done < todo.in.basenames.txt
Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47