postgres LIKE % operator vs similarity - pg_trgm.similarity_threshold . different query plans

Question

If I use:

SET pg_trgm.similarity_threshold = 0.9;
... where column % 'some string s';

PostgreSQL does an index scan on: gin (column gin_trgm_ops)

In contrast, which should be the same:

... where similarity(column, 'some string s' ) >= 0.9

That does a seqential scan instead of using the index.

From the documentation: https://www.postgresql.org/docs/11/pgtrgm.html

text % text boolean
Returns true if its arguments have a similarity that is greater than the current similarity threshold set by pg_trgm.similarity_threshold.

Why?

IIRC postgres cannot apply indices on function calls but only on operators — Bergi, Jun 26 '23 at 12:46

Laurenz Albe · Accepted Answer · 2023-06-26T13:39:36.800

2

Fundamentally, a WHERE condition must look like this to allow an index scan:

<indexed expression> <operator> <constant>

Here, <indexed expression> is what you created the index on <operator> is an operator supported by the operator family of the index's operator class, and <constant> is an expression that is constant for the duration of the index scan (in particular, it can only contain STABLE functions).

The one exception in PostgreSQL is if you use a function returning boolean in the WHERE condition, and that function has an optimizer support function that allows PostgreSQL to substitute an index scan. But similarity is no such function.

edited Jun 26 '23 at 13:39

answered Jun 26 '23 at 13:02

Laurenz Albe

209,280
17
206
263

What do you mean by "boolean function"? – Bergi Jun 26 '23 at 13:34
@Bergi A function returning a `boolean`. – Laurenz Albe Jun 26 '23 at 13:39
What would be examples of such functions - I can only find [boolean operators](https://www.postgresql.org/docs/current/functions-logical.html)? When you say that it has "*an optimiser support function*", that implies it must be a builtin one? – Bergi Jun 26 '23 at 14:12
`create index on tenk1 (starts_with(stringu1,'abc')); explain (costs off) select * from tenk1 where starts_with(stringu1,'abc');` @Bergi maybe something like this. – jian Jun 26 '23 at 14:36
@jian Nah, that's an expression index, which works on any immutable function – Bergi Jun 26 '23 at 15:11
https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/expected/misc_functions.out#n554 maybe this one @Bergi – jian Jun 26 '23 at 15:33
@jian Yeah, but is that actually used somewhere except tests? Is there a core function that does this, or is this mostly for extensions? – Bergi Jun 26 '23 at 16:17
https://www.postgresql.org/message-id/flat/CACowWR3jZ4Xo1atU5xTDpEoeeKSw1%3D%3DsKPhCy3a29HtcE9oQVQ%40mail.gmail.com#991fe612929d7a7bc3dd53212077e638 it's a niche thing. `where hundred =1111` is same as `where int4eq(hundred,1111)` but second won't use index scan @Bergi – jian Jun 27 '23 at 01:09
1

@Bergi Most of the index support for PostGIS works that way; queries like `... WHERE st_contains(...)`. [Here](https://www.cybertec-postgresql.com/en/optimizer-support-functions/) is an article about the feature. – Laurenz Albe Jun 27 '23 at 05:08

postgres LIKE % operator vs similarity - pg_trgm.similarity_threshold . different query plans

1 Answers1