Slow postgres query with long Bitmap Heap Scan "I/O Timings: read" time

Question

Hi I have a database with 10 tables. each table has ~0.5-1 billion rows, and partitioned by range and then hash (10x10=100 partitions). they are newly created tables, and import data from csv files. they are indexed on the column (id) used for querying below. the database is hosted on a Azure PostgreSQL single server.

A test query showes most of the time is used on "I/O Timings: read":

postgres=> EXPLAIN (ANALYZE, BUFFERS) select count(*) from table_id4 where id=244730;
                                                                          QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=7458.09..7458.10 rows=1 width=8) (actual time=21141.393..21141.396 rows=1 loops=1)
   Buffers: shared read=2256
   I/O Timings: read=21096.814
   ->  Append  (cost=41.26..7452.66 rows=2171 width=0) (actual time=197.168..21138.495 rows=2247 loops=1)
         Buffers: shared read=2256
         I/O Timings: read=21096.814
         ->  Bitmap Heap Scan on table_id4_r2_h5  (cost=41.26..7441.80 rows=2171 width=0) (actual time=197.167..21137.471 rows=2247 loops=1)
               Recheck Cond: (id = 244730)
               Heap Blocks: exact=2247
               Buffers: shared read=2256
               I/O Timings: read=21096.814
               ->  Bitmap Index Scan on table_id4_r2_h5_id_idx  (cost=0.00..40.72 rows=2171 width=0) (actual time=117.586..117.586 rows=2247 loops=1)
                     Index Cond: (id = 244730)
                     Buffers: shared read=9
                     I/O Timings: read=116.929
 Planning Time: 2.882 ms
 Execution Time: 21141.449 ms
(17 rows)

I also try to ANALYZE table_name and then set enable_bitmapscan= OFF; according to this post. the query plan is now index ONLY, but still slow sometimes.

postgres=> EXPLAIN (ANALYZE, BUFFERS) select count(*) from table_id2 where id=179863;
                                                                                       QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5528.78..5528.79 rows=1 width=8) (actual time=24273.748..24273.749 rows=1 loops=1)
   Buffers: shared read=1943
   I/O Timings: read=24218.684
   ->  Append  (cost=0.42..5523.93 rows=1940 width=0) (actual time=83.735..24271.326 rows=1959 loops=1)
         Buffers: shared read=1943
         I/O Timings: read=24218.684
         ->  Index Only Scan using table_id2_r10_h5_id_idx on table_id2_r10_h5  (cost=0.42..5514.23 rows=1940 width=0) (actual time=83.734..24270.157 rows=1959 loops=1)
               Index Cond: (id = 179863)
               Heap Fetches: 1959
               Buffers: shared read=1943
               I/O Timings: read=24218.684
 Planning Time: 3.254 ms
 Execution Time: 24273.787 ms
(13 rows)

[UPDATE] to answer @jjanes question on I/O speed: This database is a "replication" of an existing azure db that can query <1sec on the same tables, as ANALYZE in my last post. I notice the difference on EXPLAIN result is, last db only Buffers: shared hit=4479 read=21, with have less read and more hit. This azure db has the same price-tier as the old one as: Single Server, Basic, both with "IOPS: Variable". I only upgrade the storage from 600GB to 1TB; so I think I/O speed may not an issue?

I did a batch test and show quite varied query time in the same loop:

FOR idx IN SELECT (random()*total_IDs)::int AS id from generate_series (1,10)
LOOP ... 
select count(*) from table_id4 where id=idx;  
...
END LOOP;
NOTICE:   id: 321158 count#: 2154,   time: 46.734967s
NOTICE:   id: 487596 count#: 2238,   time: 0.968759s
NOTICE:   id: 548334 count#: 2180,   time: 1.062516s
NOTICE:   id: 404978 count#: 2179,   time: 29.750295s
NOTICE:   id: 370904 count#: 2123,   time: 22.203384s
NOTICE:   id: 228857 count#: 2223,   time: 29.094126s
NOTICE:   id: 327134 count#: 2169,   time: 24.750242s
NOTICE:   id: 372101 count#: 2180,   time: 28.062825s
NOTICE:   id: 341814 count#: 2130,   time: 30.250353s
NOTICE:   id: 248316 count#: 2195,   time: 32.375377s

but if I repeat the query on the same ids, then the time will get to ideal ms:

psql -c " ...
select count(*) from table_id4 where id=321158;
select count(*) from table_id4 where id=487596;
select count(*) from table_id4 where id=548334;
select count(*) from table_id4 where id=404978;
select count(*) from table_id4 where id=370904;
select count(*) from table_id4 where id=228857;
"
Time: 5267.168 ms (00:05.267)
Time: 171.925 ms
Time: 24.942 ms
Time: 11.387 ms
Time: 6.753 ms
Time: 17.573 ms

other tables behave similarly on query, and here is the table's details,

postgres=> \d+ table_id4
                          Unlogged table "table_id4"
   Column    |   Type   | Collation | Nullable | Default | Storage | Stats target | Description
-------------+----------+-----------+----------+---------+---------+--------------+-------------
 date        | date     |           | not null |         | plain   |              |
 field1      | real     |           |          |         | plain   |              |
 field2      | real     |           |          |         | plain   |              |
 field3      | smallint |           |          |         | plain   |              |
 id          | integer  |           | not null |         | plain   |              |
Partition key: RANGE (id)
Indexes:
    "table_id4_date_idx" btree (date)
    "table_id4_id_idx" btree (id)
Partitions: table_id4_r1 FOR VALUES FROM (0) TO (1...5), PARTITIONED,
            table_id4_r10 FOR VALUES FROM (...93) TO (MAXVALUE), PARTITIONED,
            table_id4_r2 FOR VALUES FROM (1...) TO (3...), PARTITIONED,
            table_id4_r3 FOR VALUES FROM (3...) TO (4...), PARTITIONED,
            table_id4_r4 FOR VALUES FROM (4...) TO (6...), PARTITIONED,
            table_id4_r5 FOR VALUES FROM (6...) TO (7...), PARTITIONED,
            table_id4_r6 FOR VALUES FROM (7...) TO (9...), PARTITIONED,
            table_id4_r7 FOR VALUES FROM (9...) TO (1...1), PARTITIONED,
            table_id4_r8 FOR VALUES FROM (1...1) TO (...32), PARTITIONED,
            table_id4_r9 FOR VALUES FROM (1...2) TO (...93), PARTITIONED

I see some similar discussion on here and here, but I am not very familiar with database and those tables are pretty large, so would like to understand more about the question before REINDEX or VACUUM etc, which may take days to complete.

the psql server resource monitor on the Azure portal show MAX usage (MAX usage in a given time window, not averaged over time) of [storage,memory,cpu]: 55%, 30%, <5%. so seems resource is not an issue? some server parameters:

CPU: vCore 2
total memory: 4GB
storage: 1Tb
shared_buffers: 512MB
work_mem: 4MB  (changed to 256MB but still not work)
max_parallel_workers: 10
max_parallel_maintenance_workers: 8

does the LOGGED and ENABLE TRIGGER ALL will be helpful?

[update]: postgresql version 11

You are getting about 10ms per IO. That is not good. Is this consistent with what Azure promises you for the class of machine and storage you are renting from them? Are you using the the cheapest thing you could find? — jjanes, Apr 26 '23 at 20:35
@jjanes, thank you for the help! the current azure db type is: [Single Server, Basic](https://learn.microsoft.com/en-us/azure/postgresql/single-server/concepts-pricing-tiers#storage), with "IOPS Variable". However, this database is "replication" of an existing azure db that can query <1second usually except the 1st time, as `ANALYZE` in my [last](https://stackoverflow.com/questions/75700453/postgresql-slow-1st-query-to-each-table?noredirect=1#comment133547512_75700453) post. I notice the difference is, last db only `Buffers: shared hit=4479 read=21`, with have less **read** and more **hit** — Happy Life, Apr 26 '23 at 21:02
You are essentially hampered by slow disk storage. That is what you can expect with a hosted database unless you are ready to spend big money. — Laurenz Albe, Apr 27 '23 at 06:28
No, you definitely have an IO problem, it is hard to interpret the high 'I/O Timings' any other way. In your previous question, I figured it probably it wasn't I/O just because your version is too old to give direct evidence of IO timing during planning, and because I had a hard time believing your IO (plus caching) could suck that much, but here you have conclusive evidence that it does. I'm not a Azure user, but I'm guessing "variable" means "so embarrassing we don't want to say it out loud". — jjanes, Apr 27 '23 at 14:55
@Laurenz Albe, jjanes , thanks both! :-) I will see when I am able to upgrade from a "embarrassing" tier to a "regular/decent" tier and see whether the performance gets better. — Happy Life, Apr 28 '23 at 23:33

score 0 · Answer 1 · answered Apr 26 '23 at 21:03

so would like to understand more about the question before REINDEX or VACUUM etc, which may take days to complete

You are going to need a vacuum to happen at some point anyway, so I don't see why you should delay doing one just as a learning exercise. A reindex on the other hand might never need to happen, and will lock your table while it going on, so I wouldn't preemptively do one of those.

I guess you could wonder why autovac isn't already taking care of it, and maybe a manual vacuum would destroy evidence around that question? Or since a manual vacuum is (by default) unthrottled, so doing one could lead to performance problems for other queries by using up the available resources.

For this case, I would expect it to be doing an index-only scan, and the only reason I can think of why it would not is because your pg_class.relallvisible for the table is very low relative to relpages (which in turn probably indicates that vacuum is not doing what it should). There are two ways to assess this other than running a preemptive vacuum. You could directly query pg_class and see what it says for relallvisible and relpages for the relevant tables, or you could force it do an index-only scan and see see what it reports for heap fetches in the explain (analyze, buffers) when it does one. For the later, doing set enable_bitmapscan=off will probably suffice to get it to do the index-only scan.

I see you added the bitmap off plan while I was typing, and indeed the heap fetches is as high as the rows returned, so the visibility map is not tuned up, so a vacuum is what you need to make that happen. Since you are using an old version (autovacuum_vacuum_insert_scale_factor was introduced in v13) and these are fresh tables (inserted/copied but never updated or deleted) a manual vacuum is the only way to get them vacuumed.

the psql server resource monitor on the Azure portal show MAX usage of [cpu,memory,storage]: 55%, 30%, <5%

I'm not familiar with this Azure portal. Is "storage" referring to capacity of storage (before it runs out of space), or performance of storage? If performance, does it refer to byte throughput performance (where 100% can only be obtained by sequential reads) or IOPS performance? If the latter, then maybe you would benefit by increasing effective_io_concurrency.

thank you for the detailed answer! I clarify the azure resource usage in the main post, basically I think resource is not the bottleneck. follow your suggestion, I started a `vacuum` on tables with command: `VACUUM FREEZE ANALYZE table_name` , but it takes the days from I saw your reply till now and still not finished. and there is not way for me to track the progress % as well. in my post I mentioned `ANALYZE table_name` not work previously. Is this the right command to optimize the query plan? — Happy Life, Apr 28 '23 at 23:19
I upgrade the db on Azure to a "general purpose" tier, and the performance is largely improved. it is still not ideal I will live with it. thank you for taking much effort to answer my question. I tried to cast a vote to your answer, but don't have enough "reputation" :-) — Happy Life, May 05 '23 at 16:14

Slow postgres query with long Bitmap Heap Scan "I/O Timings: read" time

1 Answers1