0

I have a pair of Illumina paired-end read files (say, A_1.fastq.gz and A_2.fastq.gz) produced from a single bacterial isolate for variant calling. First of all, I used FLASH to merge overlapping reads because of the read length (100 bp), insertion size (about 230 bp) and its standard deviation (about 50 bp). FLASH produced three read files, two for non-overlapping paired-end reads and one for merged reads (single-end). Then I aligned them against a common reference genome using bowtie, which generated two bam files (one for paired-end reads and the other for single-end reads).

To gain a higher coverage and read depth for variant calling, I would like to merge both BAM files into a single one. I plan to use BamTools for this task as it is dedicated to handle BAM files. However, I am not sure whether it is necessary to sort input BAM files prior to calling the "bamtools merge" command? It is not covered in the software tutorial or elsewhere. I would appreciate it if you could help.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Wan
  • 1
  • 1

1 Answers1

0

Well, it is a merge so, by definition, the input has to be sorted. Otherwise it won't be a merge.

Merge is the action of joining two or more sorted lists keeping the ordering. The good thing about the merge is that you don't have to do an extra sorting when your inputs are already sorted.

If the inputs are not sorted, then you can simply concatenate them and sort the final result, or sort the inputs and merge the intermediate results.

BTW, it is quite probable that if you feed unsorted bams to the merge command, it will complain about it.

Poshi
  • 5,332
  • 3
  • 15
  • 32