I have a table containing about 5 million records and a GIN index on Barcodes
column.
I have this query:
SELECT T1.*
FROM public."Skus" AS T1
WHERE (T1."Barcodes" && '{ "THE_BARCODE" }')
LIMIT 11;
It takes very long (seconds) and the explain shows me the db is constantly running a seq scan
.
Now, if I omit the LIMIT
:
SELECT T1.*
FROM public."Skus" AS T1
WHERE (T1."Barcodes" && '{ "THE_BARCODE" }');
it takes only a few milliseconds and the explain shows me an index scan
.
Why? How to avoid/overcome this?
Note: queries are dynamically generated by application, so isn't easy to remove the `LIMIT' (and sometimes it is even necessary).
Additional information as specified at: https://wiki.postgresql.org/wiki/Slow_Query_Questions
Postgres version
SELECT version();
gives
PostgreSQL 14.7 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
Operating system+version
Azure Database for PostgreSQL flexible server
Full Table and Index Schema
CREATE TABLE IF NOT EXISTS public."Skus"
(
"Id" text COLLATE pg_catalog."default" NOT NULL,
"Style" text COLLATE pg_catalog."default" NOT NULL,
"ColorId" text COLLATE pg_catalog."default",
"DropId" text COLLATE pg_catalog."default",
"SizeId" text COLLATE pg_catalog."default",
"LotId" text COLLATE pg_catalog."default",
"Barcodes" text[] COLLATE pg_catalog."default" NOT NULL,
"AuthorizationTags" text[] COLLATE pg_catalog."default" NOT NULL,
"CreatedBy" text COLLATE pg_catalog."default" NOT NULL,
"CreatedOn" timestamp with time zone NOT NULL,
"UpdatedBy" text COLLATE pg_catalog."default" NOT NULL,
"UpdatedOn" timestamp with time zone NOT NULL,
CONSTRAINT "PK_Skus" PRIMARY KEY ("Id")
)
TABLESPACE pg_default;
ALTER TABLE IF EXISTS public."Skus"
OWNER to azadmin;
-- Index: IX_Skus_Barcodes
-- DROP INDEX IF EXISTS public."IX_Skus_Barcodes";
CREATE INDEX IF NOT EXISTS "IX_Skus_Barcodes"
ON public."Skus" USING gin
("Barcodes" COLLATE pg_catalog."default")
WITH (fastupdate=True)
TABLESPACE pg_default;
Table Metadata
About 5 milions rows
SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid)
FROM pg_class
WHERE relname='TABLE_NAME'
gives:
- relname Skus
- relpages 198436
- reltuples 5,13E+12
- relallvisible 198312
- relkind r
- relnatts 12
- relhassubclass FALSE
- reloptions
- pg_table_size 1626071040
Does the table have anything unusual about it?
- contains large objects: NO
- has a large proportion of NULLs in several columns: NO
- receives a lot of UPDATEs or DELETEs regularly: NO
- is growing rapidly: NO
- has many indexes on it: NO
- uses triggers that may be executing database functions, or is calling functions directly: NO
EXPLAIN (ANALYZE, BUFFERS), not just EXPLAIN
Note: SET track_io_timing = on
is prohibited by Azure
EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT T1.*
FROM public."Skus" AS T1
WHERE (T1."Barcodes" && '{ "7622336162738" }')
LIMIT 11;
gives this: https://explain.depesz.com/s/xZj7
EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT T1.*
FROM public."Skus" AS T1
WHERE (T1."Barcodes" && '{ "7622336162738" }');
gives this: https://explain.depesz.com/s/vrFZ
History
The query with LIMIT has always been slow, the other one has always been fast
Hardware
Azure instance type: Burstable, B2s, 2 vCores, 4 GiB RAM, 32 GiB storage
Hardware benchmark
N/A (Azure)
Maintenance Setup
Are you running autovacuum? I don't know
If so, with what settings?
If not, are you doing manual VACUUM and/or ANALYZE? NO
How often?
SELECT * FROM pg_stat_user_tables WHERE relname='table_name
gives:
- relid 33377
- schemaname public
- relname Skus
- seq_scan 24
- seq_tup_read 96704575
- idx_scan 30
- idx_tup_fetch 5126593
- n_tup_ins 5129364
- n_tup_upd 0
- n_tup_del 0
- n_tup_hot_upd 0
- n_live_tup 5127368
- n_dead_tup 2580
- n_mod_since_analyze 0
- n_ins_since_vacuum 0
- last_vacuum 2023-08-01 22:58:49.72915+00
- last_autovacuum 2023-08-01 22:03:13.66993+00
- last_analyze 2023-08-01 22:58:56.888834+00
- last_autoanalyze 2023-08-01 22:03:19.781057+00
- vacuum_count 1
- autovacuum_count 7
- analyze_count 1
- autoanalyze_count 8
WAL Configuration
For data writing queries: have you moved the WAL to a different disk? Changed the settings? Not a writing query
GUC Settings
SELECT name, current_setting(name), source
FROM pg_settings
WHERE source NOT IN ('default', 'override');
gives:
- name current_setting source
- archive_command BlobLogUpload.sh %f %p configuration file
- archive_mode always configuration file
- archive_timeout 5min configuration file
- authentication_timeout 30s configuration file
- azure.extensions HSTORE,POSTGRES_FDW configuration file
- backend_flush_after 2MB configuration file
- bgwriter_delay 20ms configuration file
- bytea_output hex session
- checkpoint_completion_target 0.9 configuration file
- checkpoint_timeout 10min configuration file
- client_encoding UTF8 session
- client_min_messages notice session
- cron. enable_superuser_jobs off configuration file
- cron. host /tmp configuration file
- DateStyle ISO, MDY session
- default_text_search_config pg_catalog.english configuration file
- dynamic_shared_memory_type posix configuration file
- effective_cache_size 3GB configuration file
- hot_standby on configuration file
- jit off configuration file
- lc_messages en_US.utf8 configuration file
- lc_monetary en_US.utf-8 configuration file
- lc_numeric en_US.utf-8 configuration file
- lc_time en_US.utf8 configuration file
- listen_addresses * configuration file
- log_checkpoints on configuration file
- log_connections on configuration file
- log_disconnections on configuration file
- log_line_prefix %t-%c- configuration file
- log_replication_commands on configuration file
- log_rotation_age 1h configuration file
- log_rotation_size 100MB configuration file
- log_timezone UTC configuration file
- logging_collector off configuration file
- maintenance_work_mem 154MB configuration file
- max_connections 50 configuration file
- max_parallel_maintenance_workers 64 configuration file
- max_stack_depth 2MB environment variable
- max_wal_size 2GB configuration file
- min_wal_size 80MB configuration file
- password_encryption md5 configuration file
- pgaadauth. graph_api_endpoint https://graph.microsoft.com configuration file
- pgaadauth. issuer_prefix https://sts.windows.net/ configuration file
- pgaadauth. password_auth_enabled on configuration file
- random_page_cost 2 configuration file
- search_path $user, public configuration file
- shared_buffers 1GB configuration file
- shared_preload_libraries azure,pg_cron,pg_qs,pg_stat_statements,pgaadauth,pgms_stats,pgms_wait_sampling configuration file
- synchronous_commit on configuration file
- tcp_keepalives_count 9 configuration file
- tcp_keepalives_idle 120 configuration file
- tcp_keepalives_interval 30 configuration file
- TimeZone UTC configuration file
- wal_buffers 16MB configuration file
- wal_compression on configuration file
- wal_keep_size 400MB configuration file
Statistics: n_distinct, MCV, histogram
SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation
FROM pg_stats
WHERE attname='...' AND tablename='...'
ORDER BY 1 DESC;
gives:
- frac_mcv:
- tablename: Skus
- attname: Barcodes
- inherited: FALSE
- null_frac: 0
- n_distinct: -1
- n_mcv:
- n_hist: 101
- correlation: -0.060608394