7

I have four tables; two for current data, two for archive data. One of the archive tables has tens of millions of rows. All tables have a couple narrow indexes and are very similar.

Given the following queries:

SELECT (SELECT COUNT(*) FROM A)
UNION SELECT (SELECT COUNT(*) FROM B)
UNION SELECT (SELECT COUNT(*) FROM C_LargeTable)
UNION SELECT (SELECT COUNT(*) FROM D);

A, B and D perform index scans. C_LargeTable uses a seq scan and the query takes about 20 seconds to execute. Table D has millions of rows as well, but is only about 10% of the size of C_LargeTable

If I then modify my query to execute using the following logic, which sufficiently narrows counts, I still get the same results, the index is used and the query takes about 5 seconds, or 1/4th of the time

...
SELECT (SELECT COUNT(*) FROM C_LargeTable WHERE idx_col < 'G') 
       + (SELECT COUNT(*) FROM C_LargeTable WHERE idx_col BETWEEN 'G' AND 'Q')
       + (SELECT COUNT(*) FROM C_LargeTable WHERE idx_col > 'Q')
...

It does not makes sense to me to have the I/O overhead of a full table scan for a count when perfectly good indexes exist and there is a covering primary key which would ensure uniqueness. My understanding of postgres is that a PRIMARY KEY isn't like a SQL Server clustering index in that it determines a sort, but it implicitly creates a btree index to ensure uniqueness, which I assume should require significantly less I/O than a full table scan.

Is this potentially an indication of an optimization that I may need to perform to organize data within C_LargeTable?

Alan Samet
  • 1,118
  • 2
  • 15
  • 18
  • 2
    Which version of Postgres are you using? Only 9.2 and higher would be able to use an index. This is also a FAQ: https://wiki.postgresql.org/wiki/FAQ#Why_is_.22SELECT_count.28.2A.29_FROM_bigtable.3B.22_slow.3F –  Dec 04 '13 at 21:55
  • I am running 9.3. Your response explained why this is occurring. Is there a recommendation for making it do something else? I attempted setting enable_seqcan=false and it didn't seem to yield much difference in performance (I set it back to true afterwards). – Alan Samet Dec 04 '13 at 22:08
  • 1
    You might also want to read: https://wiki.postgresql.org/wiki/Slow_Query_Questions –  Dec 04 '13 at 22:24
  • 1
    `EXPLAIN ANALYZE` output for the query with and without `enable_seqscan = false` please. Also, does it change after you run `VACUUM ANALYZE` on the table? Always include `SELECT version()` in your questions please, and see http://stackoverflow.com/tags/postgresql/info for guidance on better questions. – Craig Ringer Dec 04 '13 at 23:20
  • Thank you for being patient with me. I executed VACUUM ANALYZE C_LargeTable. The EXPLAIN ANALYZE result then changed archive table D to now also execute seq scan. I then executed `VACUUM ANALYZE D` and executed the query again and it returned back to `Index Only Scan` – Alan Samet Dec 05 '13 at 23:18
  • I also performed the `enable_seqscan=false;` and compared it to the results of `enable_seqscan=true;` Under `false`, the query executed in approximately 9 seconds vs 13 for when `enable_seqscan=true` is set. My understanding of Postgres is that you should only have `enable_seqscan=false` when testing this sort of problem – Alan Samet Dec 05 '13 at 23:28
  • @Alan Correct, the `enable` params are really only for testing purposes. I suspect your `random_page_cost` is too high for your hardware and setup. – Craig Ringer Dec 08 '13 at 00:38

1 Answers1

11

There isn't a covering index on the primary key because PostgreSQL doesn't support them (true up to and including 9.4 anyway).

The heap scan is required because of MVCC visibility. The index doesn't contain visibility information. Pg can do an index scan, but it still has to check visibility info from the heap, and with an index scan that'd be random I/O to read the whole table, so a seqscan will be much faster.

Make sure you run 9.2 or newer, and that autovacuum is configured to run frequently on the table. You should then be able to do an index-only scan where the visibility map is used. This only works under limited circumstances as Horse notes; see the wiki page on count and on index-only scans. If you aren't letting autovacuum run regularly enough the visibility map will be outdated and Pg won't be able to do an index-only scan.

In future, make sure you post explain or preferably explain analyze output with any queries.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • "The index doesn't contain visibility information." That makes perfect sense! With that said, is there a way to perform a count without caring whether data is committed or not? – Alan Samet Dec 05 '13 at 23:30
  • 1
    @Alan No, there isn't, because it's not just uncommitted data - it's also deleted rows, old versions of updated rows, etc. You can get an approximation from pg_statistic if your autovacuum is running often enough though; see the link to the count wiki page above. – Craig Ringer Dec 06 '13 at 02:10
  • For clarification, before I check this answer off, the issue that I'm experiencing is a result of the design of postgresql. If there was a real requirement for me to perform fast counting then I assume that it's standard practice to persist such data elsewhere and update that count, by, say, incrementing the persisted value on a INSERT row trigger and decrementing using a DELETE trigger? – Alan Samet Dec 07 '13 at 21:45
  • 1
    @Alan That's correct - however, doing that has terrible consequences for performance of concurrent `INSERT`s and/or `DELETE`s, as they're all serialized on your row-count lock. It's pretty hard to have fast counts, good insert/delete concurrency, and proper transaction isolation/visibility; it's one of those "pick two" kind of things. Pg certainly doesn't behave wonderfully in that area. – Craig Ringer Dec 08 '13 at 00:36