1

I have PCR-Amplified fastq files of a specific target region from several samples. For each sample, I want to know the percentage of reads that align better to reference sequence #1 or #2 posted below. How should I begin to tackle this question and what tool for alignment is best?

I am working with Illumina paired-end adapter sequences spiked-in on a 2X150 run. The two reference amplicons are 173 and 179 bp:

1: aaaaagtataaatataggaccaggcagagcattttatacaacaggagaaataataggagatataagacaagcacattgtaaccttagtagagcaaaatggaatgacactttaaataagatagttataaaattaagagaacaatttgggaataaaacaatagtctttaagcact

2: aaaaagtatccgtatccagaggggaccagggagagcatttgttacaataggaaaaataggaaatatgagacaagcacattgtaacattagtagagcaaaatggaatgccactttaaaacagatagctagcaaattaagagaacaatttggaaataataaaacaataatctttaagcaat

We want to know if one virus wins over another after infection infection based off of the differences between these two sequences; so essentially the percentage that align best to #1 and the percentage that align best to #2.

Thank You,

Sara

  • If you do not get here quickly an answer that you need, try posting the question on [Bioinformatics Stack Exchange](https://bioinformatics.stackexchange.com/) or [Biostars](https://www.biostars.org/). – Timur Shtatland Dec 21 '22 at 16:44

1 Answers1

0
  • Convert your reference amplicons to fasta format.
  • Choose an aligner, such as bwa mem, bowtie2, etc.
  • Index the reference for your aligner of choice.
  • Align the reads to the reference using your aligner of choice.
  • Use samtools idxstats to find the number of reads aligned to each of the amplicons.

Notes:

  • It is often a good idea to trim adapters from the reads before you align the reads. A number of good adapter trimmers exist, such as flexbar, skewer, etc.
  • Many popular bioinformatics packages mentioned above can be easily installed, for example using conda.

REFERENCES:

conda
bwa
bowtie2
samtools
flexbar
skewer

Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47