Hello and thanks in advance to anyone who will take some time to answer me. This is a question to learn how to use indexes efficiently on count queries.
Version Postgresql 12.11
I have a huge table lots
(~15M rows) which should have been partitioned by status
(an integer column with actual values between 0
and 3
), but it has not.
Here is how the data is distributed on it:
-- lots by status
SELECT status, count(*), ROUND((count(*) * 100 / SUM(count(*)) OVER ()), 1) AS "%"
FROM lots
GROUP BY status
ORDER BY count(*) DESC;
status | count | % |
---|---|---|
2 | ~13.3M | 90% |
0 | ~1.5M | 10% |
1 | ~6K | ~0% |
NULL | ~0.5K | ~0% |
I also have those indexes on it:
tablename | indexname | num_rows | table_size | index_size | unique | number_of_scans | tuples_read | tuples_fetched |
---|---|---|---|---|---|---|---|---|
lots | index_lots_on_status | 1.4742644e+07 | 5024 MB | 499 MB | N | 3451 | 7060928281 | 134328966 |
lots | pidx_active_lots_on_id | 1.4742644e+07 | 5024 MB | 38 MB | Y | 23491795 | 1496103827 | 2680228 |
where the pidx_active_lots_on_id
is a partial index defined as follow:
CREATE UNIQUE INDEX CONCURRENTLY "pidx_active_lots_on_id" ON "lots" ("id" DESC) WHERE status = 0;
As you can see, the partial index on lots with status = 0 is "only" 38MB (against the 0.5GB of the full status index).
I've introduced the latter index to try to optimise this query:
SELECT count(*) FROM lots WHERE status = 0;
because the count of the lots on status 0 is the most common count case for that table, but for some reason the index seems to be ignored.
I also tried to perform a more specific query:
SELECT count(id) FROM lots WHERE status = 0;
with this second query, the index is used, but with worst results.
NOTE: I also ran an ANALYSE lots;
after the introduction of the index.
My questions are:
- Why is the index partial index ignored on the first count case (count(*))?
- Why is the second query performing worst?
Detail on plan:
EXPLAIN(ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT COUNT(*) FROM lots WHERE lots.status = 0
Aggregate (cost=539867.77..539867.77 rows=1 width=8) (actual time=16517.790..16517.791 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=79181 read=287729 dirtied=16606 written=7844
I/O Timings: read=14040.416 write=58.453
-> Index Only Scan using index_lots_on_status on public.lots (cost=0.11..539125.83 rows=1483881 width=0) (actual time=0.498..16238.580 rows=1501060 loops=1)
Output: status
Index Cond: (lots.status = 0)
Heap Fetches: 1545139
Buffers: shared hit=79181 read=287729 dirtied=16606 written=7844
I/O Timings: read=14040.416 write=58.453
Planning Time: 1.856 ms
JIT:
Functions: 3
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.466 ms, Inlining 80.076 ms, Optimization 15.797 ms, Emission 12.393 ms, Total 108.733 ms
Execution Time: 16568.670 ms
EXPLAIN(ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT COUNT(id) FROM lots WHERE lots.status = 0
Aggregate (cost=660337.71..660337.72 rows=1 width=8) (actual time=32127.686..32127.687 rows=1 loops=1)
Output: count(id)
Buffers: shared hit=80426 read=334949 dirtied=3 written=75
I/O Timings: read=11365.273 write=22.365
-> Bitmap Heap Scan on public.lots (cost=11304.17..659595.77 rows=1483887 width=4) (actual time=3783.122..30680.836 rows=1501176 loops=1)
Output: id, url, title, ... *(list of all of the 32 columns)*
Recheck Cond: (lots.status = 0)
Heap Blocks: exact=402865
Buffers: shared hit=80426 read=334949 dirtied=3 written=75
I/O Timings: read=11365.273 write=22.365
-> Bitmap Index Scan on pidx_active_lots_on_id (cost=0.00..11229.97 rows=1483887 width=0) (actual time=2534.845..2534.845 rows=1614888 loops=1)
Buffers: shared hit=4866
Planning Time: 0.248 ms
JIT:
Functions: 5
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.170 ms, Inlining 56.485 ms, Optimization 474.508 ms, Emission 205.882 ms, Total 738.045 ms
Execution Time: 32169.349 ms