0

I have a table containing about 5 million records and a GIN index on Barcodes column.

I have this query:

SELECT T1.*
  FROM public."Skus" AS T1
 WHERE (T1."Barcodes" && '{ "THE_BARCODE" }')
 LIMIT 11;

It takes very long (seconds) and the explain shows me the db is constantly running a seq scan.

Now, if I omit the LIMIT:

SELECT T1.*
  FROM public."Skus" AS T1
 WHERE (T1."Barcodes" && '{ "THE_BARCODE" }');

it takes only a few milliseconds and the explain shows me an index scan.

Why? How to avoid/overcome this?

Note: queries are dynamically generated by application, so isn't easy to remove the `LIMIT' (and sometimes it is even necessary).

Additional information as specified at: https://wiki.postgresql.org/wiki/Slow_Query_Questions

Postgres version

SELECT version();

gives

PostgreSQL 14.7 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit

Operating system+version

Azure Database for PostgreSQL flexible server

Full Table and Index Schema

CREATE TABLE IF NOT EXISTS public."Skus"
(
    "Id" text COLLATE pg_catalog."default" NOT NULL,
    "Style" text COLLATE pg_catalog."default" NOT NULL,
    "ColorId" text COLLATE pg_catalog."default",
    "DropId" text COLLATE pg_catalog."default",
    "SizeId" text COLLATE pg_catalog."default",
    "LotId" text COLLATE pg_catalog."default",
    "Barcodes" text[] COLLATE pg_catalog."default" NOT NULL,
    "AuthorizationTags" text[] COLLATE pg_catalog."default" NOT NULL,
    "CreatedBy" text COLLATE pg_catalog."default" NOT NULL,
    "CreatedOn" timestamp with time zone NOT NULL,
    "UpdatedBy" text COLLATE pg_catalog."default" NOT NULL,
    "UpdatedOn" timestamp with time zone NOT NULL,
    CONSTRAINT "PK_Skus" PRIMARY KEY ("Id")
)

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public."Skus"
    OWNER to azadmin;
-- Index: IX_Skus_Barcodes

-- DROP INDEX IF EXISTS public."IX_Skus_Barcodes";

CREATE INDEX IF NOT EXISTS "IX_Skus_Barcodes"
    ON public."Skus" USING gin
    ("Barcodes" COLLATE pg_catalog."default")
    WITH (fastupdate=True)
    TABLESPACE pg_default;

Table Metadata

About 5 milions rows

SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid)
  FROM pg_class
 WHERE relname='TABLE_NAME'

gives:

  • relname Skus
  • relpages 198436
  • reltuples 5,13E+12
  • relallvisible 198312
  • relkind r
  • relnatts 12
  • relhassubclass FALSE
  • reloptions
  • pg_table_size 1626071040

Does the table have anything unusual about it?

  • contains large objects: NO
  • has a large proportion of NULLs in several columns: NO
  • receives a lot of UPDATEs or DELETEs regularly: NO
  • is growing rapidly: NO
  • has many indexes on it: NO
  • uses triggers that may be executing database functions, or is calling functions directly: NO

EXPLAIN (ANALYZE, BUFFERS), not just EXPLAIN

Note: SET track_io_timing = on is prohibited by Azure

EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT T1.*
  FROM public."Skus" AS T1
 WHERE (T1."Barcodes" && '{ "7622336162738" }')
 LIMIT 11;

gives this: https://explain.depesz.com/s/xZj7

EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT T1.*
  FROM public."Skus" AS T1
 WHERE (T1."Barcodes" && '{ "7622336162738" }');

gives this: https://explain.depesz.com/s/vrFZ

History

The query with LIMIT has always been slow, the other one has always been fast

Hardware

Azure instance type: Burstable, B2s, 2 vCores, 4 GiB RAM, 32 GiB storage

Hardware benchmark

N/A (Azure)

Maintenance Setup

  • Are you running autovacuum? I don't know

  • If so, with what settings?

  • If not, are you doing manual VACUUM and/or ANALYZE? NO

  • How often?

    SELECT * FROM pg_stat_user_tables WHERE relname='table_name

gives:

  • relid 33377
  • schemaname public
  • relname Skus
  • seq_scan 24
  • seq_tup_read 96704575
  • idx_scan 30
  • idx_tup_fetch 5126593
  • n_tup_ins 5129364
  • n_tup_upd 0
  • n_tup_del 0
  • n_tup_hot_upd 0
  • n_live_tup 5127368
  • n_dead_tup 2580
  • n_mod_since_analyze 0
  • n_ins_since_vacuum 0
  • last_vacuum 2023-08-01 22:58:49.72915+00
  • last_autovacuum 2023-08-01 22:03:13.66993+00
  • last_analyze 2023-08-01 22:58:56.888834+00
  • last_autoanalyze 2023-08-01 22:03:19.781057+00
  • vacuum_count 1
  • autovacuum_count 7
  • analyze_count 1
  • autoanalyze_count 8

WAL Configuration

For data writing queries: have you moved the WAL to a different disk? Changed the settings? Not a writing query

GUC Settings

SELECT name, current_setting(name), source
  FROM pg_settings
 WHERE source NOT IN ('default', 'override');

gives:

  • name current_setting source
  • archive_command BlobLogUpload.sh %f %p configuration file
  • archive_mode always configuration file
  • archive_timeout 5min configuration file
  • authentication_timeout 30s configuration file
  • azure.extensions HSTORE,POSTGRES_FDW configuration file
  • backend_flush_after 2MB configuration file
  • bgwriter_delay 20ms configuration file
  • bytea_output hex session
  • checkpoint_completion_target 0.9 configuration file
  • checkpoint_timeout 10min configuration file
  • client_encoding UTF8 session
  • client_min_messages notice session
  • cron. enable_superuser_jobs off configuration file
  • cron. host /tmp configuration file
  • DateStyle ISO, MDY session
  • default_text_search_config pg_catalog.english configuration file
  • dynamic_shared_memory_type posix configuration file
  • effective_cache_size 3GB configuration file
  • hot_standby on configuration file
  • jit off configuration file
  • lc_messages en_US.utf8 configuration file
  • lc_monetary en_US.utf-8 configuration file
  • lc_numeric en_US.utf-8 configuration file
  • lc_time en_US.utf8 configuration file
  • listen_addresses * configuration file
  • log_checkpoints on configuration file
  • log_connections on configuration file
  • log_disconnections on configuration file
  • log_line_prefix %t-%c- configuration file
  • log_replication_commands on configuration file
  • log_rotation_age 1h configuration file
  • log_rotation_size 100MB configuration file
  • log_timezone UTC configuration file
  • logging_collector off configuration file
  • maintenance_work_mem 154MB configuration file
  • max_connections 50 configuration file
  • max_parallel_maintenance_workers 64 configuration file
  • max_stack_depth 2MB environment variable
  • max_wal_size 2GB configuration file
  • min_wal_size 80MB configuration file
  • password_encryption md5 configuration file
  • pgaadauth. graph_api_endpoint https://graph.microsoft.com configuration file
  • pgaadauth. issuer_prefix https://sts.windows.net/ configuration file
  • pgaadauth. password_auth_enabled on configuration file
  • random_page_cost 2 configuration file
  • search_path $user, public configuration file
  • shared_buffers 1GB configuration file
  • shared_preload_libraries azure,pg_cron,pg_qs,pg_stat_statements,pgaadauth,pgms_stats,pgms_wait_sampling configuration file
  • synchronous_commit on configuration file
  • tcp_keepalives_count 9 configuration file
  • tcp_keepalives_idle 120 configuration file
  • tcp_keepalives_interval 30 configuration file
  • TimeZone UTC configuration file
  • wal_buffers 16MB configuration file
  • wal_compression on configuration file
  • wal_keep_size 400MB configuration file

Statistics: n_distinct, MCV, histogram

SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation
  FROM pg_stats
 WHERE attname='...' AND tablename='...'
 ORDER BY 1 DESC;

gives:

  • frac_mcv:
  • tablename: Skus
  • attname: Barcodes
  • inherited: FALSE
  • null_frac: 0
  • n_distinct: -1
  • n_mcv:
  • n_hist: 101
  • correlation: -0.060608394
MatteoSp
  • 2,940
  • 5
  • 28
  • 36
  • The excerpt of pg_stats you provided is suitable for scalar attributes, but not for array attributes as you have here. What are in most_common_elems and most_common_elem_freqs for the Barcode attribute? – jjanes Aug 23 '23 at 13:35
  • Can't check now, but: in extremely rare case I have 1 or 3 items within the array, for the rest (99% or more) the array contains 2 items. Every array item can appear in 2 records at most. – MatteoSp Aug 23 '23 at 15:25

1 Answers1

0

It thinks there will be 25,637 qualifying rows and it will get to stop after reading only 11 of them, which would be quite fast if it were true. But it actually has to read the entire table because there is only 1 qualifying row.

I think you are running into a problem already known, that the pg_stats.most_common_elem_freqs is left null in this case, and then when the planner sees the NULL it assumes it will return 0.5% of the table which is way too high. There was agreement it was a problem, but not a consensus on the best way to solve it, so no solution was ever implemented. If you are willing to compile your own postgresql, you could try one of approaches discussed in the linked thread and see if those solve the problem for you.

The only practical solution I see is to install pg_hint_plan and then provide a hint /*+ BitmapScan("Skus") */ for the query. But if you can't alter the query text, I don't see how you could inject the hint into it. These hints are very much like Oracle's hints, so maybe your app has mechanism for doing that if it aims to be cross-platform.

jjanes
  • 37,812
  • 5
  • 27
  • 34