Extract read names and the associated nucleotides on specific positions from a BAM file (in R)

Question

Let's assume I have a BAM file and several positions that I would like to examine more closely in this alignment. My goal is to find out whether these positions are on the same reads and which nucleotides each read has in these two positions. So my output should be a table of all reads that occur in the positions and list the associated nucleotides.

The output should look like this (with 980, 1350 and 1400 the first three positions; I made up the read names):

read name	980	1350	1400	...
ReAdN4m1L03	A	T		...
ReAdN4m2L02	A	T	G	...
ReAdN4m3L01		T	G	...
...	...	...	...	...

At the end I would like to have a table that links the nucleotides via the reads.

So far, I have tried to read in the Bam file at the desired positions:

#Positions of interest
pos <- c(980, 1350, 1400)

#path to BAM file
path <- pathToBamFile

#name of reference used for mapping
refname <- "myreference"

#loading the bam file
bamFile <- BamFile(path)
gr <- GRanges(seqnames = refname, ranges = IRanges(start = pos, end = pos))
params <- ScanBamParam(which = gr, what = scanBamWhat())
aln <- scanBam(bamFile, param = params)

With the following code I can access the reads, but how do I get the corresponding nucleotides in the individual positions?

pos.no <- c(1:length(pos))

i <- 1
aln[[pos.no[i]]]$qname

I have already tried to work with the CIGAR strings aln[[i]]$cigar with the sequences aln[[i]]$seq in order to extract to the nucleotides manually, but that led to more and more problems.

Is there any easy way to extract the nucleotides?

I have searched for a long time now on the internet for an easy solution but without success. Hopefully someone will help me here! Many thanks for your time in advance.

Probably a better question for https://bioinformatics.stackexchange.com/. Either you'll need to parse the CIGAR string or use a tool that can. — MrFlick, Aug 20 '21 at 21:08
Hi wennj, I think you'll be more likely to get an answer here if you create a *tiny* `.bam` file (see [here](https://www.biostars.org/p/48719/) how to subset), provide it in an easily accessible location, show how the data appears in R, and very succinctly show your expected output. Right now your question has a bunch of extraneous information that most programmers on this site won't understand and isn't actually necessary to answer. — Ian Campbell, Aug 21 '21 at 13:06
@IanCampbell Hi Ian! Ah ok I understand what you mean. I will prepare the data and see if I can present it better with a concrete example. Will it be better du modify the question above or open a new thread? — wennj, Aug 21 '21 at 15:24
Please click the [edit] link under the question and modify the existing one. Edits automatically move the question to the top of "active questions" for more exposure. — Ian Campbell, Aug 21 '21 at 15:36

Extract read names and the associated nucleotides on specific positions from a BAM file (in R)

0 Answers0