1

I have a Postgres database with around 1.5 million records. In my Ruby on Rails app, I need to search the statement_text field (which can contain anywhere from 1 to hundreds of words).

My problem: I know I can use the pgSearch gem to create scopes like search_all_words or search_any_words, but I'm uncertain what is the most efficient way to ensure only records with the exact match are returned in the result set.

That is, if I search "Pope Francis", I want it to find only those two words when they're consecutive and in the same order (as opposed to, say, "The pope is named Francis").

So far, I've just combined a GIN index with ILIKE for exact match searches. But given that a GIN index essentially works by storing the exact position of a word in every record, shouldn't there be a more efficient (non-ILIKE) way of ensuring that the search term is an exact match with the field?

Dmitry S
  • 4,990
  • 2
  • 24
  • 32
jayp
  • 192
  • 2
  • 13

1 Answers1

1

Generally speaking, full-text requires word stemming based on language dictionary used, so with With Full-Text search you can use ts_rank() function without stemming and with 'simple' dictionary to determine the relevance of the phrase you are searching for.

WITH t(v) AS ( VALUES
  ('Test sentence with Pope Francis'),
  ('Test Francis sentence with Pope '),
  ('The pope is named Francis')
)
SELECT v,ts_rank(tsv,q) as rank
FROM t,
    to_tsvector('simple',v) as tsv,
    plainto_tsquery('simple','Pope Francis') AS q;

Result:

                v                 |   rank    
----------------------------------+-----------
 Test sentence with Pope Francis  | 0.0991032
 Test Francis sentence with Pope  | 0.0973585
 The pope is named Francis        | 0.0973585
(3 rows)

Without full-text search, you can implement just faster ILIKE pattern matching with pg_trgm extension. Example is here.

Dmitry S
  • 4,990
  • 2
  • 24
  • 32
  • Thanks, Dmitry! I am using full-text search so I can use ts_rank. The problem, though, is that -- depending on the field's word count -- there's no specific cutoff which will ensure that the words are adjacent and in the correct order, without using ILIKE. Am I understanding that correctly? – jayp Apr 12 '16 at 00:43
  • no, there is no cut off based on the length. If there is a match for phrase, it will still have higher rank. you may omit normalization factor for ts_rank - it should work fine. – Dmitry S Apr 12 '16 at 03:41