I have four tables; two for current data, two for archive data. One of the archive tables has tens of millions of rows. All tables have a couple narrow indexes and are very similar.
Given the following queries:
SELECT (SELECT COUNT(*) FROM A)
UNION SELECT (SELECT COUNT(*) FROM B)
UNION SELECT (SELECT COUNT(*) FROM C_LargeTable)
UNION SELECT (SELECT COUNT(*) FROM D);
A, B and D perform index scans. C_LargeTable uses a seq scan
and the query takes about 20 seconds to execute. Table D has millions of rows as well, but is only about 10% of the size of C_LargeTable
If I then modify my query to execute using the following logic, which sufficiently narrows counts, I still get the same results, the index is used and the query takes about 5 seconds, or 1/4th of the time
...
SELECT (SELECT COUNT(*) FROM C_LargeTable WHERE idx_col < 'G')
+ (SELECT COUNT(*) FROM C_LargeTable WHERE idx_col BETWEEN 'G' AND 'Q')
+ (SELECT COUNT(*) FROM C_LargeTable WHERE idx_col > 'Q')
...
It does not makes sense to me to have the I/O overhead of a full table scan for a count when perfectly good indexes exist and there is a covering primary key which would ensure uniqueness. My understanding of postgres is that a PRIMARY KEY
isn't like a SQL Server clustering index in that it determines a sort, but it implicitly creates a btree index to ensure uniqueness, which I assume should require significantly less I/O than a full table scan.
Is this potentially an indication of an optimization that I may need to perform to organize data within C_LargeTable?