2

I am using to_tsvector, and it works fine... But a new demand, to preserve exact original words (raw text), need "bypass" as dictionary.

... something like to use to_tsvector('raw', myString) where myString is something like "AATT GAA", no meaning for any dictionary.

Peter Krauss
  • 13,174
  • 24
  • 167
  • 304

2 Answers2

5

SELECT to_tsvector('simple','your string');

Dmitry S
  • 4,990
  • 2
  • 24
  • 32
  • Thanks @DanielVérité and Dmitry. – Peter Krauss Mar 04 '16 at 13:18
  • Do you know how to exclude stop words in a certain language? No stemming but language-based stop words – sparkle Oct 04 '19 at 12:29
  • If you want to avoid stopwords you could create a text search dictionary which simply used a blank file of stop words. For instance, see link for specifying a stopwords file in your dictionary. Then you'll need to create a text search configuration that uses that dictionary. https://www.postgresql.org/docs/current/textsearch-dictionaries.html#TEXTSEARCH-STOPWORDS – Peter Gerdes May 25 '22 at 02:50
  • Or better yet just directly skip them: https://stackoverflow.com/questions/1497895/can-i-configure-postgresql-programmatically-to-not-eliminate-stop-words-in-full – Peter Gerdes May 25 '22 at 03:06
0

Just in case anyone else coming here is looking to really manually create a ts_vector you can skip using to_tsvector entierly (even with the simple dictionary it still sends it through the parser and drops spaces/punctuation e.g. ```to_tsvector('simple', '\Sigma^0_1') breaks that text up into three lexmes 0, 1 and sigma)

As described in this part of the manual and this answer. You simply do this

$$' space containing lexeme ' ' and one with position and weight'5c$$::tsvector;

Peter Gerdes
  • 2,288
  • 1
  • 20
  • 28