0

I have a table of events where each event has a title and description. Searches should search both columns:

title: Dick(ens) and Jane
description: Meet weekly to discuss classic books!

Given the above, I would like a query of book club to match.

My index:

CREATE INDEX evsearch_idx on events using gist((title || ' ' || description) gist_trgm_ops)

My query:

SELECT * FROM events WHERE title || ' ' || description ILIKE '%book club%'

I think the problem is that I'm fundamentally misunderstanding how an index works and therefore it's attempting to match the entire string book club to various three-letter combinations but I'm not sure how to fix this.

Matt
  • 2,953
  • 3
  • 27
  • 46
  • Maybe you want `SELECT * FROM events WHERE title || ' ' || description SIMILAR TO '%(book|club)%'` [demo](https://dbfiddle.uk/?rdbms=postgres_11&fiddle=e79d5a66a50c35458b9e9a245f8d297e) – Lukasz Szozda Sep 08 '18 at 06:54
  • If I search for `jane`, nothing is returned in this case. Do I need to use `ILIKE` for one word and `SIMILAR TO` when finding multiple? – Matt Sep 08 '18 at 07:43
  • `SELECT * FROM events WHERE title || ' ' || description SIMILAR TO '%(Jane)%'` – Lukasz Szozda Sep 08 '18 at 09:04

1 Answers1

0

ILIKE will always try to match the whole string. For example, to match ILIKE '%book club%', a string needs to include the whole string book club.

If you want to use the similarity matching potential of pg_trm, you need to use the matching operators for which it was designed. Taken from the doc:

  1. % returns true if its arguments have a similarity that is greater than the current similarity threshold set by pg_trgm.similarity_threshold.
  2. <% returns true if the similarity between the trigram set in the first argument and a continuous extent of an ordered trigram set in the second argument is greater than the current word similarity threshold set by pg_trgm.word_similarity_threshold parameter.

Side note #1: Make it case insensitive by adding lower(...) in your index definition and in your queries.

Side note #2: title || ' ' || description will return NULL if title or description is NULL. If you want to guard against one of the 2 values being NULL, create the index on COALESCE(title, '') || ' ' || COALESCE(description, '')

Side note #3: pg_trgm can only do so much. For really complex queries, you might want to look into full text search

Matthieu Libeer
  • 2,286
  • 1
  • 12
  • 16