0

I'm trying to run a bioinformatics command line tool on all .bam files in a directory. This is what I'm using:

#!/bin/sh

reference='/path/Homo_sapiens_assembly19.fasta'

for f in *.bam
do 
    base_name=${f%.bam}
    java -jar /ppath/GenomeAnalysisTK.jar -R $reference \
   -T ASEReadCounter \
   -o $base_name.csv \
   -I $f \
   -sites $base_name.vcf \
   -U ALLOW_N_CIGAR_READS \
   -minDepth 10 \
   --minMappingQuality 10 \
   --minBaseQuality 2
done;

The problem is that the loop stops after iterating over the first bam file. I will eventually like this to go over a set of 2000 .bam files, and I don't want to have to enter them all manually (it will take >30 hours).

Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
SinghG
  • 11
  • 4
  • How do you know it stops after the first file? – melpomene Jun 19 '16 at 23:08
  • I only get 1 output file, and in the terminal, I only see a call to the first .bam file ie. substitute base_name and $f for sample1. @John1024: No the script does not. That was a formatting error. I know that the script works in the sense that it gives me exactly what I want for 1 bam in the directory. – SinghG Jun 19 '16 at 23:14
  • What do you mean by "call to the first .bam file"? – melpomene Jun 19 '16 at 23:15
  • Sorry, I wasn't clear. The directory has files named sampleN.bam/sampleN.vcf for N = 1,2.... I see that it executes the desired command for sample1.bam, sample1.vcf, but not for any subsequent (bam,vcf) pair. – SinghG Jun 19 '16 at 23:17
  • How do you see that it executes the command? – melpomene Jun 19 '16 at 23:17
  • 3
    Btw. `bash` != `sh` – Cyrus Jun 19 '16 at 23:18
  • @melpomene when the software tool runs, it displays the entire command: – SinghG Jun 19 '16 at 23:19
  • So ... have you attempted any debugging? – melpomene Jun 19 '16 at 23:20
  • @Cyrus Thanks! Honestly, I didn't even know how to change directories until Monday of last week. All very new for me. – SinghG Jun 19 '16 at 23:20
  • @melpomene I'm not quite sure what to attempt. It took me a few hours of fiddling just to get the desired output. I've seen some people iterate over files using a file list. Maybe I'll try that. – SinghG Jun 19 '16 at 23:25
  • This might help: [How to debug a bash script?](http://unix.stackexchange.com/q/155551/74329) – Cyrus Jun 19 '16 at 23:29
  • There's also a handy shell script tool online you might find useful: http://shellcheck.net – l'L'l Jun 19 '16 at 23:33
  • So the shellcheck.net said "Double quote to prevent globbing and word splitting" whenever $ was used in the script. I'm now running it with the double quotes around those regions. – SinghG Jun 19 '16 at 23:47
  • Adding double quotes to those regions didn't change anything. Still only ran once. – SinghG Jun 19 '16 at 23:57
  • Okay, I figured it out. Basically, I was running this as a test so that when I need to do it later this week, I'm ready to do it. The problem is, I only have 1 bam file, the second is a copy. When I copied it, it was copied as a text file with bam extension. When I remove the original bam file, I get an error where no .bams are input into the software. I will download and clean another bam file to test. – SinghG Jun 20 '16 at 00:22

1 Answers1

0

Try the following:

#!/bin/bash

reference='/path/Homo_sapiens_assembly19.fasta'

for f in $(ls ./*.bam); do
    base_name=${f%.bam} 
    #base_name=$(basename ${f})  # alternatively you can use this
    java -jar /ppath/GenomeAnalysisTK.jar -R ${reference} \
   -T ASEReadCounter \
   -o ${base_name}.csv \
   -I ${f} \
   -sites ${base_name}.vcf \
   -U ALLOW_N_CIGAR_READS \
   -minDepth 10 \
   --minMappingQuality 10 \
   --minBaseQuality 2
done;

I suspect your -o $base_name.csv was looking for a variable called basename.csv and not basename, thus overwriting the output file, making it appear as if only a single bam file was processed. This should be easy to fix by calling your bash variables by using ${basename}, before extending them with a suffix.

ljc
  • 943
  • 2
  • 10
  • 26
  • The original script ended up working. The problem was that my second bam file in the directory was actually a txt file. I ran it again with two genuine bam files, and it worked. – SinghG Jun 21 '16 at 20:21