pg_search: how to prioritize exact word matches?

Question

Probem: When searching texts with multiple occurrences of words similar to the search query they get higher rank than texts with one exact match.

Example: Say search query is "productivity", then "production of organic products" gets higher rank than "labour productivity" just because it contains two similar words, though no exact word matches.

Question: What's the best way to prioritize records that have exact matches?

Here's my pg_scope:

  pg_search_scope :search,
                  :against => {
                    :title => 'A',
                    :search_string => 'B'
                  },
                  :using => {
                    :tsearch => {
                      :dictionary => 'english',
                      :any_word => 'true'
                    }
                  }

Thx.

@Kathan I didn't find a proper solution. I ended up applying an exact match order in my query .select(Dataset.arel_table[:title].matches("%#{q}%").to_sql << 'as exact_match').order('exact_match desc').search(q) But if I was doing it again I would not use pg_search and use elastic search instead https://www.elastic.co/ — vrepsys, Dec 30 '15 at 14:02
You could check out fuzzystrmatch and use levenshtein to give check for exactness https://www.postgresql.org/docs/9.1/fuzzystrmatch.html — Int'l Man Of Coding Mystery, Feb 12 '20 at 12:03

score 0 · Answer 1 · answered Jun 27 '14 at 10:07

If you can manipulate the search query, have a look around at cover density ranking (ts_rank_cd() instead of ts_rank()) and its normalization parameter.

SELECT v, ts_rank(to_tsvector('english', v), to_tsquery('english', 'productivity')) rank,
       ts_rank_cd(to_tsvector('english', v), to_tsquery('english', 'productivity')) rankcd,
       ts_rank_cd(to_tsvector('english', v), to_tsquery('english', 'productivity'), 4) rankcd4,
       ts_rank_cd(to_tsvector('english', v), to_tsquery('english', 'productivity'), 6) rankcd6
FROM (
  VALUES ('production of organic products'::TEXT),
         ('labour productivity'),
         ('labour productivity with more unrelated words'),
         ('labour productivity with more unrelated words and again production'),
         ('production of productivity'),
         ('product production')
) d(v)

SQLFiddle

However, I'm not sure how to set up pg_search_scope with custom normalization.

Alternatively, you might be interested in trigram searches as well, they offer more exact (character by character) results.

pg_search: how to prioritize exact word matches?

1 Answers1