2

I am trying to access methods provided by htsjdk.jar from here: https://samtools.github.io/htsjdk/

and documented here: https://samtools.github.io/htsjdk/javadoc/htsjdk/index.html

using jython. I need methods for accessing / querying BAM file index (BAI file) to get start-end positions in binary BAM file. The test BAM & BAI files can be obtained i.e from: https://github.com/samtools/htsjdk/tree/master/testdata/htsjdk/samtools/BAMFileIndexTest

In jython 2.7.0 after putting in Jython registry:

python.security.respectJavaAccessibility = false
#I did in the jython comandline:
import sys
sys.path.append("/usr/local/soft/picard_1.138/htsjdk-1.138.jar")
from htsjdk.samtools import *
from java.io import File
#the BAM index file + BAM files 
bai_fh = File("./index_test.bam.bai")
mydict = SAMSequenceDictionary()
bai_foo = DiskBasedBAMFileIndex(bai_fh, mydict)

I can access some methods, like bai_foo.getNumberOfReferences() etc., but the desired method
getBinsOverlapping(int referenceIndex, int startPos, int endPos) is in BrowseableBAMIndex interface.

But I am lost when it comes to subclassing Java classes in Jython. The task is to get the list of BAM file chunk(s) corresponding to a given genomic position. For the test BAM/BAI files, i.e. for chrM 10000-15000 (chromosome, start_pos, end_pos) I get 11 mapped reads using off the shelf samtools standalone program instead of htsjdk:

samtools view index_test.bam  chrM:10000-15000

Many thanks for your help

Darek

Edit: re the groovy part Groovy Version: 2.4.4

groovy -cp libs/htsjdk-1.138.jar test_htsjdk.groovy

#!/usr/bin/env groovy

import htsjdk.samtools.*

File bam_fh =  new File("./A.bam")
File bai_fh =  new File("./A.bam.bai")

def mydict = new SAMSequenceDictionary()
def bai_foo = new DiskBasedBAMFileIndex(bai_fh, mydict)
println bai_foo.getNumberOfReferences()

Above code works in groovy. My problem is not that this code does not work, but that I do not know the proper way of accessing the methods from Java classes dealing with BAI file format. I did search for AbstractBAMFileIndex in htsjdk/src/java/htsjdk/samtools/*java files (git clone from repo@github), but it is still not clear what I need to do.

darked89
  • 332
  • 1
  • 2
  • 17
  • 2
    What's this to do with groovy? Apart from you put groovy in the title – tim_yates Sep 02 '15 at 06:24
  • @tim_yates: Sorry for the confusion. I assumed that the solution depends less on use jython vs groovy and more at finding a way to instantiate/subclass one of htsjdk.samtools Java classes. And while I am coming from python, I will be happy if I get it working in python or groovy. Or Jruby if that will be the first choice for somebody patient enough to figure out relevant parts of htsjdk.samtools. – darked89 Sep 02 '15 at 16:29
  • What int reference index for getBinsOverlapping does "chrM" correspond to? – WillShackleford Sep 09 '15 at 23:07
  • I would use groovy directly as you can copy and paste the java using examples without having to modify your code. – Nathan Dunn Sep 11 '15 at 22:52

2 Answers2

3

There is one example on their github, I've attached a small section but there are more examples of how to use their SamReader

    /**
     * Broken down
     */
    final SamReaderFactory factory =
            SamReaderFactory.makeDefault().enable(SamReaderFactory.Option.VALIDATE_CRC_CHECKSUMS).validationStringency(ValidationStringency.LENIENT);

    final SamInputResource resource = SamInputResource.of(new File("/my.bam")).index(new URL("http://broadinstitute.org/my.bam.bai"));

    final SamReader myReader = factory.open(resource);

    for (final SAMRecord samRecord : myReader) {
        System.err.print(samRecord);
    }

Groovy is pretty awesome for interacting with a bunch of files regardless of format.

new File('my/directory/with/many/files').eachFile{File readFile ->
    final SamInputResource resource = SamInputResource.of(readFile).index(new URL('IGotLazy.bai'))
    final SamReader myReader = ... etc

}
ScientificMethod
  • 493
  • 3
  • 10
2

I didn't know what to put for the reference index but the printout looked good enough to me as someone who never deals with this kind of data:

import sys
sys.path.append("/home/shackle/sam-tools/samtools-htsjdk-f650176/dist/htsjdk-1.138.jar")
from htsjdk.samtools import *
from java.io import File
#the BAM index file + BAM files 
indexFile = File("/home/shackle/index_test.bam.bai")
bamFile = File("/home/shackle/index_test.bam")
sfr = SAMFileReader(bamFile, indexFile)
sfr.enableIndexCaching(True)
bbi = sfr.getBrowseableIndex()
for i in range(0,256):
    print "i = " , i
    bl = bbi.getBinsOverlapping(i, 10000, 15000)
    count = 0
    for bin in bl:
        print "bin.getBinNumber() = " , bin.getBinNumber()
        count = count + 1
        print "count = ", count
WillShackleford
  • 6,918
  • 2
  • 17
  • 33