I have a BAM-file with on a certain position 520817 reads (as seen in IGV). However, when I use pysam to get the read name and the associated nucleotide on a specific position, I do not get that amount by far (only get around 7000 reads). I think I only get reads when the nucleotide on that position is different from the reference genome. Is there a workaround, so I get all the reads? I'm beginning with bioinformatics... so please let me know what you need more to help me!
Thank you very much!
This is my code:
import pysam
import csv
import sys
#---Get a table with in the first column: read-ID; second column: SNP-location; third column: nucleotide---#
mybam = pysam.AlignmentFile("file.bam", "rb")
w = csv.writer(open("snp.csv", "wb"), delimiter=",")
w.writerow(["Read", "Loc", "Nucl"])
for pileupcolumn in mybam.pileup('chr6', 29911198,29911199):
print ("\ncoverage at base %s = %s" %
(pileupcolumn.pos, pileupcolumn.n))
for pileupread in pileupcolumn.pileups:
if not pileupread.is_del:
if pileupcolumn.pos == 29911198:
w.writerow((pileupread.alignment.query_name, 29911198, pileupread.alignment.query_sequence[pileupread.query_position]))
print ('\tbase in read %s = %s' % (pileupread.alignment.query_name, pileupread.alignment.query_sequence[pileupread.query_position]))
mybam.close()