1

In the SAM format, each alignment line represents the linear alignment of a segment, and each line have 11 mandatory fields, i.e. QNAME, FLAG, RNAME, POS, MAPQ, etc.

Let's say I wanted a NumPy array of all "QNAMES" in a given BAM file. Or, one could take several columns and import them into Pandas Dataframe.

Is this functionality possible with pysam?

One can naturally open a given BAM file with pysam.AlignmentFile() and then access individual segments with pysam.AlignmentSegment(), e.g.

seg = AlignmentSegment()
print(seg.qname)

However, could you save all QNAMES into NumPy array?

Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234

2 Answers2

0

Yes that is doable. Note that when importing reads from a BAM file with pysam for your purposes, it is best done using the fetch() function, which creates an iterator over all the reads (pysam.AlignmentSegment() objects) in the BAM file. QNAME is then retrieved using the query_name function:

import pysam
import numpy as np

my_bam_file = '/path/to/your/bam_file.bam'
imported = pysam.AlignmentFile(my_bam_file, mode = 'rb')
bam_it = imported.fetch(until_eof = True)
# Use head(n) instead of fetch(), if you only want to retrieve the first 'n' reads
qnames = [read.query_name for read in bam_it]

Here, qnames is a list of all QNAMEs in the BAM file. If you insist on getting a NumPy array, just add the following line at the end:

qnames = np.asarray(qnames)
Brunox13
  • 775
  • 1
  • 7
  • 21
0
# pip install pyranges 
# or 
# conda install -c bioconda pyranges

import pyranges
bam_df = pyranges.read_bam(f, sparse=False, as_df=True, mapq=0, required_flag=0, filter_flag=1540)
The Unfun Cat
  • 29,987
  • 31
  • 114
  • 156