2

Postgresql fulltext search match nearby words, but i want to search the exact word in large content table (thousands of words in hundreds of thousands of lines). My search must be very fast (less than a second). Using like or ilike is to slow( with 200000 words: more than 5 seconds). Does anyone have a query to advise me?

Mike.S
  • 47
  • 10
  • Please clarify why PostgreSQL full text search does not fit the bill. – Laurenz Albe Jun 28 '18 at 14:43
  • because if i search "servant" for example, postgresql full text search also match "server", but in my case i want that postgresql only match the exact word "servant", and not nearby words like "server" or other. – Mike.S Jun 29 '18 at 08:12
  • Oh, that's simple. Just use a dictionary that does *not* apply stemming. – Laurenz Albe Jun 29 '18 at 12:26
  • How can I do that? have you an exemple? I use french dictionnary but an english dictionnary with good explanation could help me. – Mike.S Jul 02 '18 at 07:00
  • I have my own dictionary using this article some adaptations http://shisaa.jp/postset/postgresql-full-text-search-part-2.html – Mike.S Jul 02 '18 at 08:46

1 Answers1

5

You should be able to solve your problem with PostgreSQL full text search if you use the simple dictionary and create an appropriate GIN index:

CREATE TABLE haystack (id serial PRIMARY KEY, string text NOT NULL);
INSERT INTO haystack (string) VALUES ('I am your servant');
INSERT INTO haystack (string) VALUES ('What use is a server without a client?');

CREATE INDEX haystack_fts_ind ON haystack USING gin (to_tsvector('simple', string));

Let's disable sequential scans so the index is used, even though the example table is too small:

SET enable_seqscan=off;

Now only exact matches are found, and no stemming takes place:

SELECT * FROM haystack
WHERE to_tsvector('simple', string) @@ to_tsquery('simple', 'servant');

 id |      string       
----+-------------------
  1 | I am your servant
(1 row)

The index can be used to speed up the query:

EXPLAIN (COSTS off) SELECT * FROM haystack
WHERE to_tsvector('simple', string) @@ to_tsquery('simple', 'servant');

                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Bitmap Heap Scan on haystack
   Recheck Cond: (to_tsvector('simple'::regconfig, string) @@ '''servant'''::tsquery)
   ->  Bitmap Index Scan on haystack_fts_ind
         Index Cond: (to_tsvector('simple'::regconfig, string) @@ '''servant'''::tsquery)
(4 rows)
Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263