6

Probem: When searching texts with multiple occurrences of words similar to the search query they get higher rank than texts with one exact match.

Example: Say search query is "productivity", then "production of organic products" gets higher rank than "labour productivity" just because it contains two similar words, though no exact word matches.

Question: What's the best way to prioritize records that have exact matches?

Here's my pg_scope:

  pg_search_scope :search,
                  :against => {
                    :title => 'A',
                    :search_string => 'B'
                  },
                  :using => {
                    :tsearch => {
                      :dictionary => 'english',
                      :any_word => 'true'
                    }
                  }

Thx.

vrepsys
  • 2,143
  • 4
  • 25
  • 37
  • Did you ever figure this out? – Kathan Dec 29 '15 at 18:25
  • @Kathan I didn't find a proper solution. I ended up applying an exact match order in my query .select(Dataset.arel_table[:title].matches("%#{q}%").to_sql << 'as exact_match').order('exact_match desc').search(q) But if I was doing it again I would not use pg_search and use elastic search instead https://www.elastic.co/ – vrepsys Dec 30 '15 at 14:02
  • You could check out fuzzystrmatch and use levenshtein to give check for exactness https://www.postgresql.org/docs/9.1/fuzzystrmatch.html – Int'l Man Of Coding Mystery Feb 12 '20 at 12:03

1 Answers1

0

If you can manipulate the search query, have a look around at cover density ranking (ts_rank_cd() instead of ts_rank()) and its normalization parameter.

SELECT v, ts_rank(to_tsvector('english', v), to_tsquery('english', 'productivity')) rank,
       ts_rank_cd(to_tsvector('english', v), to_tsquery('english', 'productivity')) rankcd,
       ts_rank_cd(to_tsvector('english', v), to_tsquery('english', 'productivity'), 4) rankcd4,
       ts_rank_cd(to_tsvector('english', v), to_tsquery('english', 'productivity'), 6) rankcd6
FROM (
  VALUES ('production of organic products'::TEXT),
         ('labour productivity'),
         ('labour productivity with more unrelated words'),
         ('labour productivity with more unrelated words and again production'),
         ('production of productivity'),
         ('product production')
) d(v)

SQLFiddle

However, I'm not sure how to set up pg_search_scope with custom normalization.

Alternatively, you might be interested in trigram searches as well, they offer more exact (character by character) results.

pozs
  • 34,608
  • 5
  • 57
  • 63