0

Related but different question Using a GIN index in PostgreSQL with a LIKE clause reveals the trigram extension which is very powerful for similarity search and sort.

But in my case I have an array column, indexed with GIN (and normalized to upper) but I want to allow head searches to use the index as in the like operator.

create table tab (
  col text[]
);

create or replace function upper(text[]) returns text[] language sql as $$
   select upper($1::text)::text[]
$$ strict immutable parallel safe;

create index on tab using GIN(upper(col));

Now fill that with random data:

insert into tab
select array(select array_to_string(array(select substr('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', trunc(random() * 62)::integer + 1, 1) from generate_series(1, slen)), '')
                  from (select trunc(random() * 60)::integer + 10 as slen from generate_series(1,tlen)) x) as col
  from (select id, trunc(random() * 5)::integer + 1 as tlen from generate_series(1,10000) as id) x;

Now grab a value from the middle:

 select * from tab offset 5000 limit 1;
                                                                                         col                                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 {hSQbUOKSTA1A9ISb5qx9kkG6fJ5HP3lQhaoEbM4MHgJZ,IMzcLns5waH8Gat5njGBwv4AptkmtxlfUxLyPx1VSZsitvq,pNw67x9mZ6nhJrGDONNRmpT7pbx,oRUNc9ka3f12reEW8OmzaYQufLYRAlHWGTo,PBD87EsMtupwZpVgfcxu5}
(1 row)

and look it up:

explain analyze 
select * from tab where upper(col) && upper('{oRUNc9ka3f12reEW8OmzaYQufLYRAlHWGTo}')::text[]

gives

                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tab  (cost=841.23..1103.03 rows=158 width=32) (actual time=1.526..1.549 rows=1 loops=1)
   Recheck Cond: (upper(col) && ('{ORUNC9KA3F12REEW8OMZAYQUFLYRALHWGTO}'::cstring)::text[])
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on tab_upper_idx  (cost=0.00..841.19 rows=158 width=0) (actual time=1.510..1.515 rows=1 loops=1)
         Index Cond: (upper(col) && ('{ORUNC9KA3F12REEW8OMZAYQUFLYRALHWGTO}'::cstring)::text[])
 Planning Time: 0.095 ms
 Execution Time: 1.596 ms
(7 rows)

So it does use the GIN index. I note the "cstring" data type in here, which I have not seen before.

Now I would like to be able to do a partial match also, let's find all rows ILIKE 'ORUNC%'.

select * from tab where upper(col) &&% upper('{ORUNC}')::text[]

where I have &&% as a fictitious operator which would do a prefix match.

How could that be implemented?

Gunther Schadow
  • 1,490
  • 13
  • 22

1 Answers1

0

Idea 1

Normalize your design and unnest the array into a separate table with one element per row. Then create a plain B-tree index with COLLATE "C" or an SP-GiST index to support prefix matches - in a query that's adapted accordingly. Ideally using the "starts with" operator ^@:

...
WHERE col_element ^@ 'foo';

See:

Idea 2

While stuck with the array column, use a text search index on the array converted to tsvector which supports prefix matching. You probably want the "simple" dictionary for that. And you need an IMMUTABLE wrapper for the array_to_string(), which allows anyarray input, hence only STABLE. While being at it, merge upper() into the same:

CREATE OR REPLACE FUNCTION arr_to_upper_string(_arr text[], _sep text = ' ') 
  RETURNS text
  LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE
BEGIN ATOMIC
SELECT upper(array_to_string($1, $2));
END;

COMMENT ON FUNCTION arr_to_upper_string(text[], text) IS 'array_to_string() is not immutable, as it takes "anyarray" input, and some types have no immutable conversion to text.
For text[] input it is, in fact immutable. We need an immutable fucntion for indexes. Hence the wrapper.
upper(text) is immutable anyway.
But feeding other array types to this function (and casting implicitly or explicitly) may not be immutable!'

Using an SQL standard function (optional). See:

Create a functional index with it (or add a redundant, generated column to the table).

CREATE INDEX tab_col_tsv_gin_idx ON tab USING gin (to_tsvector('simple', arr_to_upper_string(col)));

Then your query can be:

SELECT * FROM tab
WHERE to_tsvector('simple', arr_to_upper_string(col)) @@ to_tsquery('simple', 'ORUNC:*');

fiddle

See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228