1

In the database, all tables were initially set to the wrong sort order: LC_COLLATE=C and LC_CTYPE = C. It turned out that the search for the player by nickname does not work for the Cyrillic alphabet. So I decided to change the sorting rules for a separate column with a nickname:

CREATE COLLATION IF NOT EXISTS english (provider = icu, locale = 'en_US');
ALTER TABLE player
    ALTER COLUMN nickname SET DATA TYPE character varying(64) COLLATE "english";

Next, I wanted to do a quick search for a player by nickname using trigram indexes. I created a gin index:

CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX CONCURRENTLY player_nickname_idx_gin ON player USING gin (nickname gin_trgm_ops);

After that the search for the nickname was greatly accelerated in Latin alphabet and in the query plan it is clear that the gin index is used.

EXPLAIN ANALYSE
SELECT * FROM player
WHERE nickname like '%qwerty%';

QUERY PLAN
"Bitmap Heap Scan on player  (cost=81.19..6172.98 rows=1702 width=266) (actual time=2.788..6.735 rows=580 loops=1)"
"  Recheck Cond: ((nickname)::text ~~ '%qwerty%'::text)"
"  Rows Removed by Index Recheck: 448"
"  Heap Blocks: exact=1021"
"  ->  Bitmap Index Scan on player_nickname_idx_gin  (cost=0.00..80.77 rows=1702 width=0) (actual time=2.446..2.447 rows=1028 loops=1)"
"        Index Cond: ((nickname)::text ~~ '%qwerty%'::text)"
"Planning Time: 0.304 ms"
"Execution Time: 6.869 ms"

But the same query doesn't work with Cyrillic alphabet...

EXPLAIN ANALYSE
SELECT * FROM player
WHERE nickname like '%абвгд%';

QUERY PLAN
"Gather  (cost=1000.00..122898.55 rows=532 width=266) (actual time=190.395..1243.729 rows=27 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  ->  Parallel Seq Scan on player  (cost=0.00..121845.35 rows=222 width=266) (actual time=256.894..1103.830 rows=9 loops=3)"
"        Filter: ((nickname)::text ~~ '%абвгд%'::text)"
"        Rows Removed by Filter: 1773005"
"Planning Time: 0.252 ms"
"JIT:"
"  Functions: 12"
"  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"  Timing: Generation 10.563 ms, Inlining 0.000 ms, Optimization 5.027 ms, Emission 59.776 ms, Total 75.367 ms"
"Execution Time: 1248.374 ms"

I also tried to set a collate for the gin index, but nothing changed. Tell me how to get the gin index to take trigrams not based on the default collation of the base? Is there any way to avoid rebuilding the cluster in this situation?

skipkolch
  • 11
  • 2

0 Answers0