Postgresql fulltext search match nearby words, but i want to search the exact word in large content table (thousands of words in hundreds of thousands of lines). My search must be very fast (less than a second). Using like or ilike is to slow( with 200000 words: more than 5 seconds). Does anyone have a query to advise me?
Asked
Active
Viewed 1,913 times
2
-
Please clarify why PostgreSQL full text search does not fit the bill. – Laurenz Albe Jun 28 '18 at 14:43
-
because if i search "servant" for example, postgresql full text search also match "server", but in my case i want that postgresql only match the exact word "servant", and not nearby words like "server" or other. – Mike.S Jun 29 '18 at 08:12
-
Oh, that's simple. Just use a dictionary that does *not* apply stemming. – Laurenz Albe Jun 29 '18 at 12:26
-
How can I do that? have you an exemple? I use french dictionnary but an english dictionnary with good explanation could help me. – Mike.S Jul 02 '18 at 07:00
-
I have my own dictionary using this article some adaptations http://shisaa.jp/postset/postgresql-full-text-search-part-2.html – Mike.S Jul 02 '18 at 08:46
1 Answers
5
You should be able to solve your problem with PostgreSQL full text search if you use the simple
dictionary and create an appropriate GIN index:
CREATE TABLE haystack (id serial PRIMARY KEY, string text NOT NULL);
INSERT INTO haystack (string) VALUES ('I am your servant');
INSERT INTO haystack (string) VALUES ('What use is a server without a client?');
CREATE INDEX haystack_fts_ind ON haystack USING gin (to_tsvector('simple', string));
Let's disable sequential scans so the index is used, even though the example table is too small:
SET enable_seqscan=off;
Now only exact matches are found, and no stemming takes place:
SELECT * FROM haystack
WHERE to_tsvector('simple', string) @@ to_tsquery('simple', 'servant');
id | string
----+-------------------
1 | I am your servant
(1 row)
The index can be used to speed up the query:
EXPLAIN (COSTS off) SELECT * FROM haystack
WHERE to_tsvector('simple', string) @@ to_tsquery('simple', 'servant');
QUERY PLAN
------------------------------------------------------------------------------------------
Bitmap Heap Scan on haystack
Recheck Cond: (to_tsvector('simple'::regconfig, string) @@ '''servant'''::tsquery)
-> Bitmap Index Scan on haystack_fts_ind
Index Cond: (to_tsvector('simple'::regconfig, string) @@ '''servant'''::tsquery)
(4 rows)

Laurenz Albe
- 209,280
- 17
- 206
- 263