Extracting ITS Region from Paired-end FASTQ Data Using a Reference Genome: Guidance Needed

Question

I have 20 isolates labeled as A_1 to A_50. For each isolate, I have paired-end FASTQ files. I also have a reference genome file available. I would like to extract the sequences of ITS (Internal Transcribed Spacer) region from each isolate using the provided reference genome. The coordinates of the ITS region in the reference genome are known. Can someone please guide me on how to perform this extraction for each isolate?

I attempted to use BWA-MEM to align the paired-end FASTQ data for each sample. The command I used was

bwa mem -t ${Threads} -M -R "@RG\tID:${SampleName}\tLB:${SampleName}\tPL:ILLUMINA\tSM:${SampleName}" ${Reference} ${SampleName}/*_R1.fastq.gz ${SampleName}/*_R2.fastq.gz | samtools view -bS - > ${output_dir}/${SampleName}.bam

This command aligns the reads to the reference genome, and I specified the number of threads to use, as well as the read group information. After aligning, I sorted the resulting BAM file using samtools sort and created an index using samtools index.

But after that, I dont know how should i proceed to extract the regions.

Extracting ITS Region from Paired-end FASTQ Data Using a Reference Genome: Guidance Needed

0 Answers0