1

I'm currently trying to import a .fasta file into bwa to use a reference genome to map my reads to. However, i currently am getting this error:

[E::bwa_idx_load_from_disk] fail to locate the index files

Any help? Here is my code:

#!/bin/bash

source /opt/asn/etc/asn-bash-profiles-special/modules.sh
module load fastqc/0.10.1
module load fastx/0.0.13
source /opt/asn/etc/asn-bash-profiles-special/modules.sh
module load pear/0.9.10
source /opt/asn/etc/asn-bash-profiles-special/modules.sh
module load fastqc/0.10.1
module load fastx/0.0.13   
module load bwa/0.7.12
module load samtools/1.2
source /opt/asn/etc/asn-bash-profiles-special/modules.sh
module load trimmomatic/0.35

r=20
####mapping
#Indexing reference library for BWA mapping:
bwa index -a is ~/gz_files/sample_things/fungiref.fa fungiref

bwa mem fungiref sample${r}_clipped_paired.assembled.fastq > sample${r}.sam

#sort and convert to bam
samtools view -bS sample${r}.sam | samtools sort - sample{r}_sorted

#counts and stats
samtools index sample${r}_sorted.bam
samtools idxstats sample${r}_sorted.bam > ${r}_counts.txt
Braiam
  • 1
  • 11
  • 47
  • 78
Haley
  • 119
  • 1
  • 3
  • 16
  • You might find more accurate help on their [man page](http://bio-bwa.sourceforge.net/bwa.shtml) or [mailing list](https://sourceforge.net/p/bio-bwa/mailman/) – M. Becerra Jun 04 '17 at 20:45
  • You might be interested in the following bioinformatics beta stackexchange site: https://area51.stackexchange.com/proposals/109245/bioinformatics/visit – bli Jun 05 '17 at 08:47

1 Answers1

4

The usage for bwa index is

bwa index [-p prefix] [-a algoType] <in.db.fasta>

Your usage doesn’t match this; it’s a shame that bwa silently accepts this instead of immediately throwing an error. Annoyingly, there is also simply no way to specify a path prefix for the index. You’re stuck with the location of your reference.

At any rate, the index filename is derived from the FASTA reference file. As a consequence you need to adjust your index filename in the subsequent command:

bwa mem ~/gz_files/sample_things/fungiref.fa sample${r}_clipped_paired.assembled.fastq > sample${r}.sam
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Thank you for your help! One last question, not sure if you'd know: When mapping back to the reference, how closely does a sequence have to match to be considered "mapped"? I am trying to compare different databases, and I need to know the confidence in which BWA maps with. I have scoured the internet and haven't been able to find anything. – Haley Jun 06 '17 at 14:45
  • @Haley you control that with several options you pass to `bwa mem`. The parameters unfortunately aren’t trivial … the best way is to consult the explanation of the BWA MEM algorithm in the publication. At any rate, the default cutoff for the match score (`-T`) is 30. This score is calculated using the match score (`-A`), mismatch penalty (`-B`), and the gap opening (`-O`) and extension (`-E`) penalty, as well as the clipping penalty (`-L`) and the penalty for unpaired reads (`-U`). BWA MEM is quite idiosyncratic in this: usually the clipping penalty is 0, and unpaired reads are pass/fail. – Konrad Rudolph Jun 06 '17 at 15:08
  • I thought as such.. do you have any reccomendations for mapping programs that use a high cutoff for what they considered mapped? I have short reads from an environmental sample. – Haley Jun 06 '17 at 16:20
  • @Haley I don’t know how I would parametrise BWA MEM, since reads may have variable length so an absolute cutoff is pretty impractical. However, I would generally just set the cutoff to be very close to the read length (e.g. such that a one or two indels are tolerated). – Konrad Rudolph Jun 06 '17 at 16:35