2

I'm try to create GIN for fast similarity search on bitsrings on postgres 9.6.

I use tanimoto/jaccard similarity metric:

tanimoto = popcount(bs1 & bs2)/popcount(bs1 | bs2)

where 0 - is totally not similar and 1 - is identical.

update:

popcount - is count of true bits. bs1, bs2 - bitstrings.

I use query like this:

SELECT bs, tanimoto(B'11100000', bs) as t FROM test WHERE bs % B'11100000'

on table:

 CREATE TABLE test (bs bit(8));

% operator is bool(tanimoto(bs, B'11100000') > treshold)

treshold configurable.

How can I enable GIN or how to implement operator_class if need?

I want to implement this http://pubs.acs.org/doi/abs/10.1021/ci200552r index

  • Please be more specific. Table definition, Postgres version, explain "tanimoto/jaccard" or add a link to details. Where do `bs1` and `bs2` come from? Same row or dynamically distinct rows? – Erwin Brandstetter Jan 14 '17 at 16:12
  • Sorry to dig up old corpses. Tanimoto Distance is a measure of the similarity between 2 bitstrings. Much used in computational chemistry, where the (sometimes large) bitmask represents molecular fragments. The Tanimoto distance is then used to find similar chemical compounds to a single given compound that you are searching for. So bs1 comes out of the table of compounds and bs2 is calculated from the given compund. – Ellert van Koperen Feb 23 '18 at 21:12

0 Answers0