Why Postgres does not use GIN index if I specify a LIMIT clause?

Question

I have a table containing about 5 million records and a GIN index on Barcodes column.

I have this query:

SELECT T1.*
  FROM public."Skus" AS T1
 WHERE (T1."Barcodes" && '{ "THE_BARCODE" }')
 LIMIT 11;

It takes very long (seconds) and the explain shows me the db is constantly running a seq scan.

Now, if I omit the LIMIT:

SELECT T1.*
  FROM public."Skus" AS T1
 WHERE (T1."Barcodes" && '{ "THE_BARCODE" }');

it takes only a few milliseconds and the explain shows me an index scan.

Why? How to avoid/overcome this?

Note: queries are dynamically generated by application, so isn't easy to remove the `LIMIT' (and sometimes it is even necessary).

Additional information as specified at: https://wiki.postgresql.org/wiki/Slow_Query_Questions

Postgres version

SELECT version();

gives

PostgreSQL 14.7 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit

Operating system+version

Azure Database for PostgreSQL flexible server

Full Table and Index Schema

CREATE TABLE IF NOT EXISTS public."Skus"
(
    "Id" text COLLATE pg_catalog."default" NOT NULL,
    "Style" text COLLATE pg_catalog."default" NOT NULL,
    "ColorId" text COLLATE pg_catalog."default",
    "DropId" text COLLATE pg_catalog."default",
    "SizeId" text COLLATE pg_catalog."default",
    "LotId" text COLLATE pg_catalog."default",
    "Barcodes" text[] COLLATE pg_catalog."default" NOT NULL,
    "AuthorizationTags" text[] COLLATE pg_catalog."default" NOT NULL,
    "CreatedBy" text COLLATE pg_catalog."default" NOT NULL,
    "CreatedOn" timestamp with time zone NOT NULL,
    "UpdatedBy" text COLLATE pg_catalog."default" NOT NULL,
    "UpdatedOn" timestamp with time zone NOT NULL,
    CONSTRAINT "PK_Skus" PRIMARY KEY ("Id")
)

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public."Skus"
    OWNER to azadmin;
-- Index: IX_Skus_Barcodes

-- DROP INDEX IF EXISTS public."IX_Skus_Barcodes";

CREATE INDEX IF NOT EXISTS "IX_Skus_Barcodes"
    ON public."Skus" USING gin
    ("Barcodes" COLLATE pg_catalog."default")
    WITH (fastupdate=True)
    TABLESPACE pg_default;

Table Metadata

About 5 milions rows

SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid)
  FROM pg_class
 WHERE relname='TABLE_NAME'

gives:

relname Skus
relpages 198436
reltuples 5,13E+12
relallvisible 198312
relkind r
relnatts 12
relhassubclass FALSE
reloptions
pg_table_size 1626071040

Does the table have anything unusual about it?

contains large objects: NO
has a large proportion of NULLs in several columns: NO
receives a lot of UPDATEs or DELETEs regularly: NO
is growing rapidly: NO
has many indexes on it: NO
uses triggers that may be executing database functions, or is calling functions directly: NO

EXPLAIN (ANALYZE, BUFFERS), not just EXPLAIN

Note: SET track_io_timing = on is prohibited by Azure

EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT T1.*
  FROM public."Skus" AS T1
 WHERE (T1."Barcodes" && '{ "7622336162738" }')
 LIMIT 11;

gives this: https://explain.depesz.com/s/xZj7

EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT T1.*
  FROM public."Skus" AS T1
 WHERE (T1."Barcodes" && '{ "7622336162738" }');

gives this: https://explain.depesz.com/s/vrFZ

History

The query with LIMIT has always been slow, the other one has always been fast

Hardware

Azure instance type: Burstable, B2s, 2 vCores, 4 GiB RAM, 32 GiB storage

Hardware benchmark

N/A (Azure)

Maintenance Setup

Are you running autovacuum? I don't know
If so, with what settings?
If not, are you doing manual VACUUM and/or ANALYZE? NO
How often?

SELECT * FROM pg_stat_user_tables WHERE relname='table_name

gives:

relid 33377
schemaname public
relname Skus
seq_scan 24
seq_tup_read 96704575
idx_scan 30
idx_tup_fetch 5126593
n_tup_ins 5129364
n_tup_upd 0
n_tup_del 0
n_tup_hot_upd 0
n_live_tup 5127368
n_dead_tup 2580
n_mod_since_analyze 0
n_ins_since_vacuum 0
last_vacuum 2023-08-01 22:58:49.72915+00
last_autovacuum 2023-08-01 22:03:13.66993+00
last_analyze 2023-08-01 22:58:56.888834+00
last_autoanalyze 2023-08-01 22:03:19.781057+00
vacuum_count 1
autovacuum_count 7
analyze_count 1
autoanalyze_count 8

WAL Configuration

For data writing queries: have you moved the WAL to a different disk? Changed the settings? Not a writing query

GUC Settings

SELECT name, current_setting(name), source
  FROM pg_settings
 WHERE source NOT IN ('default', 'override');

gives:

name current_setting source
archive_command BlobLogUpload.sh %f %p configuration file
archive_mode always configuration file
archive_timeout 5min configuration file
authentication_timeout 30s configuration file
azure.extensions HSTORE,POSTGRES_FDW configuration file
backend_flush_after 2MB configuration file
bgwriter_delay 20ms configuration file
bytea_output hex session
checkpoint_completion_target 0.9 configuration file
checkpoint_timeout 10min configuration file
client_encoding UTF8 session
client_min_messages notice session
cron. enable_superuser_jobs off configuration file
cron. host /tmp configuration file
DateStyle ISO, MDY session
default_text_search_config pg_catalog.english configuration file
dynamic_shared_memory_type posix configuration file
effective_cache_size 3GB configuration file
hot_standby on configuration file
jit off configuration file
lc_messages en_US.utf8 configuration file
lc_monetary en_US.utf-8 configuration file
lc_numeric en_US.utf-8 configuration file
lc_time en_US.utf8 configuration file
listen_addresses * configuration file
log_checkpoints on configuration file
log_connections on configuration file
log_disconnections on configuration file
log_line_prefix %t-%c- configuration file
log_replication_commands on configuration file
log_rotation_age 1h configuration file
log_rotation_size 100MB configuration file
log_timezone UTC configuration file
logging_collector off configuration file
maintenance_work_mem 154MB configuration file
max_connections 50 configuration file
max_parallel_maintenance_workers 64 configuration file
max_stack_depth 2MB environment variable
max_wal_size 2GB configuration file
min_wal_size 80MB configuration file
password_encryption md5 configuration file
pgaadauth. graph_api_endpoint https://graph.microsoft.com configuration file
pgaadauth. issuer_prefix https://sts.windows.net/ configuration file
pgaadauth. password_auth_enabled on configuration file
random_page_cost 2 configuration file
search_path $user, public configuration file
shared_buffers 1GB configuration file
shared_preload_libraries azure,pg_cron,pg_qs,pg_stat_statements,pgaadauth,pgms_stats,pgms_wait_sampling configuration file
synchronous_commit on configuration file
tcp_keepalives_count 9 configuration file
tcp_keepalives_idle 120 configuration file
tcp_keepalives_interval 30 configuration file
TimeZone UTC configuration file
wal_buffers 16MB configuration file
wal_compression on configuration file
wal_keep_size 400MB configuration file

Statistics: n_distinct, MCV, histogram

SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation
  FROM pg_stats
 WHERE attname='...' AND tablename='...'
 ORDER BY 1 DESC;

gives:

frac_mcv:
tablename: Skus
attname: Barcodes
inherited: FALSE
null_frac: 0
n_distinct: -1
n_mcv:
n_hist: 101
correlation: -0.060608394

The excerpt of pg_stats you provided is suitable for scalar attributes, but not for array attributes as you have here. What are in most_common_elems and most_common_elem_freqs for the Barcode attribute? — jjanes, Aug 23 '23 at 13:35
Can't check now, but: in extremely rare case I have 1 or 3 items within the array, for the rest (99% or more) the array contains 2 items. Every array item can appear in 2 records at most. — MatteoSp, Aug 23 '23 at 15:25

jjanes · Answer 1 · 2023-08-23T14:52:02.557

It thinks there will be 25,637 qualifying rows and it will get to stop after reading only 11 of them, which would be quite fast if it were true. But it actually has to read the entire table because there is only 1 qualifying row.

I think you are running into a problem already known, that the pg_stats.most_common_elem_freqs is left null in this case, and then when the planner sees the NULL it assumes it will return 0.5% of the table which is way too high. There was agreement it was a problem, but not a consensus on the best way to solve it, so no solution was ever implemented. If you are willing to compile your own postgresql, you could try one of approaches discussed in the linked thread and see if those solve the problem for you.

The only practical solution I see is to install pg_hint_plan and then provide a hint /*+ BitmapScan("Skus") */ for the query. But if you can't alter the query text, I don't see how you could inject the hint into it. These hints are very much like Oracle's hints, so maybe your app has mechanism for doing that if it aims to be cross-platform.

Why Postgres does not use GIN index if I specify a LIMIT clause?

Postgres version

Operating system+version

Full Table and Index Schema

Table Metadata

EXPLAIN (ANALYZE, BUFFERS), not just EXPLAIN

History

Hardware

Hardware benchmark

Maintenance Setup

WAL Configuration

GUC Settings

Statistics: n_distinct, MCV, histogram

1 Answers1