0

I'm trying to perform a substructure search on chemical database, using Avalon fingerprint precomputed for every compound. There is a method to compare these fingerprints in RDKit:

DataStructs.AllProbeBitsMatch ( fp1, fp2 )

Docs describe this method like this: "Returns True if all bits in the first argument match all bits in the vector defined by the pickle in the second argument".

They talk about Bit Vectors, but this fingerprint can also be computed "As Words" (array of integers, via GetAvalonFPAsWords method in RDKit, that I can store in MongoDB and hopefully perform search without RDKit, only using the power of database (which must be much faster).

So this is my question: I need some sort of operation for arrays, which is equivalent to AllProbeBitsMatch for bit vectors. Ideally this operation should be done on MongoDB, probably using aggregation features for better performance.

This is an article for RDKit and Avalon fingerprints I use for reference: http://rdkit.blogspot.com/2013/11/fingerprint-based-substructure.html

Artico
  • 101
  • 6
  • Have you tried Postgres cartridge? It is rather unlikely you get even close when it comes to performance using aggregation framework unless you want to extend MongoDB indexing capabilities. – zero323 Jun 26 '15 at 14:47
  • I haven't tried Postgres cartridge, I'd like to use NoSQL database due to a number of reasons, however I'm a bit new to the topic, although there's some research that proves comparable performance of MongoDB for similarity search: http://blog.matt-swain.com/post/87093745652/chemical-similarity-search-in-mongodb – Artico Jun 29 '15 at 12:14

0 Answers0