5

When trying to find "Harrison Ford" in a document, pg_search will return any text that contains 'Harrison' and 'Ford', for example:

  pg_search_scope :search_by_full_name, :against => [:first_name, :last_name]

People.search_by_full_name("Harrison Ford")

can return:

George Harrison drives a Ford Focus

How can I make sure only exact matches of 'Harrison Ford' will return?

Dmitry S
  • 4,990
  • 2
  • 24
  • 32
Ashbury
  • 2,160
  • 3
  • 27
  • 52

2 Answers2

1

You need to use pg_search normalization, or basically ranking search results in Postgres. I did not even use normalization factor in the following examples:

SELECT ts_rank_cd(vector,query) as rank 
FROM 
  to_tsvector('simple','George Harrison drives a Ford Focus') as vector,
  to_tsquery('simple','Harrison & Ford') as query;

Output 1:

   rank    
-----------
 0.0333333
(1 row)

If you have Harrison and Ford together - rank will be higher:

SELECT ts_rank_cd(vector,query) as rank
FROM
  to_tsvector('simple','Harrison Ford drives a car') as vector,
  to_tsquery('simple','Harrison & Ford') as query;

Output 2:

 rank 
------
  0.1
(1 row)

If you ORDER BY rank DESC all you search results, you will get what you need, because all the search words which are next to each other will the highest rank and will be at the top of your search result list.

Dmitry S
  • 4,990
  • 2
  • 24
  • 32
  • Sorry, I don't understand the normalization algorithm choices. What threshold do I set to disregard anything other than an exact match? – Ashbury Jan 07 '16 at 11:44
  • the point here is that words which are next to each other will have the highest rank if you `ORDER BY rank DESC` - you will get to the top what you need; in my example I did not even use normalization factor @Ashbury – Dmitry S Jan 07 '16 at 12:46
  • The problem is that this could still return a series of near matches (and no exact matches) ordered by how close it is to the original. Even if I took the top result there's no guarantee its an exact match. – Ashbury Jan 07 '16 at 17:58
  • @Ashbury if you are looking for exact **phrase** matching - maybe you need to check [pattern matching](http://www.postgresql.org/docs/9.4/static/functions-matching.html) not full text searh – Dmitry S Jan 07 '16 at 18:05
  • 1
    Thanks, I'm going to do full_text search then use pattern matching on the results. pg_search with tvector takes about ~11ms, then pattern matching the results is 7ms VS pattern matching the full table (460ms). Saving 442ms (in my little test) – Ashbury Jan 07 '16 at 19:37
0

You just need to replace the space character by <=> in the query search term as follows:

query.gsub!(' ', '<=>')
search_by_full_name(query)