5

I want to implement an information retrieval system which uses vector space model, but with multi-term tokens and a custom term weighting function.

I am considering building my inverted index in PostgreSQL instead of file system. I read about GIN index which build such an index on a tsvector column.

Can I build tsvector values manually without calling to_tsvector function so that I can build my "custom" vector with custom tokens and custom weights ?

Nina
  • 508
  • 4
  • 21

2 Answers2

3

You can make tsvectors by hand. But as far as I know you can only assign 4 different weights, A, B, C, or D. Multi-word tokens will have to be put in single-quotes in order to keep them together as one token.

select $$'two words':1c oneword$$::tsvector;
         tsvector         
--------------------------
 'oneword' 'two words':1C
jjanes
  • 37,812
  • 5
  • 27
  • 34
  • But weights I need to assign to term is a modified TF-IDF value, not 'A','B' or 'C'! Is that possible or not? – Nina Jan 11 '20 at 11:06
  • @Nina Not possible with the tsvector. – jjanes Jan 11 '20 at 14:17
  • Emmm, so I can't rely on PostgreSQL for my information retrieval system. I will mark your post as an answer anyway for you gave manual way to build tsvector – Nina Jan 11 '20 at 14:26
  • If you are going so far as to build your own indexes you could just add a new type to postgres and use their GIN indexes on your data type. You'd have to write an extension to do it but you can certainly make use of the GIN index on custom data types. – Peter Gerdes May 25 '22 at 02:56
3

In case it is helpful to anyone, building on the original answer:

select $$'foo':1 'bar':2 'baz':10$$::tsvector;
         tsvector         
--------------------------
 'bar':2 'baz':10 'foo':1
(1 row)
michael
  • 827
  • 2
  • 10
  • 11