1

I have a BAM-file with on a certain position 520817 reads (as seen in IGV). However, when I use pysam to get the read name and the associated nucleotide on a specific position, I do not get that amount by far (only get around 7000 reads). I think I only get reads when the nucleotide on that position is different from the reference genome. Is there a workaround, so I get all the reads? I'm beginning with bioinformatics... so please let me know what you need more to help me!

Thank you very much!

This is my code:

import pysam
import csv
import sys

#---Get a table with in the first column: read-ID; second column: SNP-location; third column: nucleotide---#
mybam = pysam.AlignmentFile("file.bam", "rb")
w = csv.writer(open("snp.csv", "wb"), delimiter=",")
w.writerow(["Read", "Loc", "Nucl"])
for pileupcolumn in mybam.pileup('chr6', 29911198,29911199):
    print ("\ncoverage at base %s = %s" %
           (pileupcolumn.pos, pileupcolumn.n))
    for pileupread in pileupcolumn.pileups:
        if not pileupread.is_del:
            if pileupcolumn.pos == 29911198:
                w.writerow((pileupread.alignment.query_name, 29911198, pileupread.alignment.query_sequence[pileupread.query_position]))             
                print ('\tbase in read %s = %s' % (pileupread.alignment.query_name, pileupread.alignment.query_sequence[pileupread.query_position]))

mybam.close()
  • Are you sure about 520817 reads in one position? Sounds quite high. Where in IGV did you get the value? I just compared the values ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/exome_alignment/HG00096.chrom11.ILLUMINA.bwa.GBR.exome.20120522.bam with pysam and IGV and they are identical. – Maximilian Peters Sep 24 '16 at 20:19
  • Hey Maximilian, I looked in the coverage track in IGV. Almost all the reads are overlapping and have similar start en end positions. When I fetch, I get correct estimations. However, when doing pileup, this does not work and I get a lot less reads... – Michaël Noë Sep 25 '16 at 04:09
  • Can you try your code on some public data, e.g. the BAM file in my previous comment? It's a lot easier to work to troubleshoot with identical data. – Maximilian Peters Sep 25 '16 at 06:43

1 Answers1

1

Check IGV option View-->Preference-->Alignment, some "filter xxxx" options ( duplicates, secondary alignments, low quality ) may change the output.

Usually pysam doesn't pileup reads with BAM_FUNMAP, BAM_FSECONDARY, BAM_FQCFAIL, BAM_FDUP flags, so make sure your IGV view options are same as the options in pysam.AlignmentFile.pileup. Otherwise they could generate different outputs.

woodword
  • 21
  • 3