PostgresQL query using index scan backward instead of index only scan

Question

I'm running into something I cannot explain and I have been googling for a few days now and have not yet found the cause for my "problem" with the PostgresQL scheduler causing a (relatively simple) query to take massive amounts of time.

Let's start from the top (I've tried to remove as much useless information as possible so the tables may look pointless but trust me, they're not):

I have the following schema:

CREATE TABLE ct_log (
    ID integer,
    CONSTRAINT ctl_pk
        PRIMARY KEY (ID)
);

CREATE TABLE ct_log_entry (
    CERTIFICATE_ID  bigint      NOT NULL,
    ENTRY_ID        bigint      NOT NULL,
    ENTRY_TIMESTAMP timestamp   NOT NULL,
    CT_LOG_ID       integer     NOT NULL,
    CONSTRAINT ctle_ctl_fk
        FOREIGN KEY (CT_LOG_ID)
        REFERENCES ct_log(ID)
) PARTITION BY RANGE (ENTRY_TIMESTAMP);

-- I will not repeat this one 7 times, but there are partition for each year from 2013-2020:
CREATE TABLE ct_log_entry_2020 PARTITION OF ct_log_entry
    FOR VALUES FROM ('2020-01-01T00:00:00'::timestamp) TO ('2021-01-01T00:00:00'::timestamp);

CREATE INDEX ctle_c ON ct_log_entry (CERTIFICATE_ID);
CREATE INDEX ctle_e ON ct_log_entry (ENTRY_ID);
CREATE INDEX ctle_t ON ct_log_entry (ENTRY_TIMESTAMP);
CREATE INDEX ctle_le ON ct_log_entry (CT_LOG_ID, ENTRY_ID DESC);

(in case you are curious about the full schema: https://github.com/crtsh/certwatch_db/blob/master/sql/create_schema.sql)

And this is the query I am trying to run:

SELECT ctl.ID, latest.entry_id
FROM ct_log ctl
LEFT JOIN LATERAL (
    SELECT coalesce(max(entry_id), -1) entry_id
    FROM ct_log_entry ctle
    WHERE ctle.ct_log_id = ctl.id
) latest ON TRUE;

For the people that know https://crt.sh this might look familiar because this is indeed the schema from crt.sh. This makes it a bit interesting since crt.sh provides public PostgresQL access allowing me to compare query plans between my own server and theirs.

My server query plan (~700s): https://explain.depesz.com/s/ZKkt
Public crt.sh query plan (~3ms): https://explain.depesz.com/s/01Ht

This difference is quit noticeable (:sad_smile:) but I'm not sure why because as far as I know I have the correct indexes for this to be very fast and the same indexes as the crt.sh server.

It looks like my instance is using a backwards index scan instead of a index only scan for the largest 2 partitions. This was not always the case and previously it execute using the same query plan as the crt.sh instance but for some reason it decided to stop doing that.

(This is the amount of data in those tables in case it's not clear from the query plans: https://d.bouma.dev/wUjdXJXk1OzF. I cannot see how much is in the crt.sh database because they don't provide access to the individual partitions)

Now onto the list of thing I've tried:

ANALYZE the ct_log_entry (and ct_log_entry_* tables created by the partitioning)
VACUUM ANALYZE the ct_log_entry (and ct_log_entry_* tables created by the partitioning)
VACUUM FULL the ct_log_entry (and ct_log_entry_* tables created by the partitioning)
Dropping the ctle_le index and recreating it again (this worked once for me giving me a few hours of great performance until I imported more data and it went with the backwards scan again)
REINDEX INDEX the ctle_le index on each ct_log_entry_* table
SET random_page_cost = x;, tried 1, 1.1, 4 and 5 (according to many SO answers and blog posts)

The only thing I notice that are different is that crt.sh is running PostgresQL 12.1 and I'm running 12.3, but as far as I can tell that shouldn't have any impact.

Also before you say, "yes well, but you cannot run this amount of data on your laptop", the server I'm running is a dedicated box with 32 available threads and 128GB RAM and running a RAID 5 with 8 2TB Samsung EVO 860 drives on hardware RAID (yes I know this is bad if a drive fails, that's another issue I'll deal with later but the read performance should be excellent). I don't know what crt.sh is running for hardware but since I only have a fraction of the data imported I don't see my hardware being the issue here (yet).

I've also "tuned" my config using the guide here: https://pgtune.leopard.in.ua/#/.

Happy to provide more info where needed but hoping someone can point me to a flaw and/or provide a solution to resolve the problem and show PostgresQL how to use the optimal path!

No one can tell you what the exact cause of this problem is. However, it feels as if your cost estimations are not good (see your `EXPLAIN ANALYZE`) and as a result, the query planner chooses to (sequentially) scan a large part of the table instead of retrieving random pages using an index-only scan on the index. I suggest trying to increase the `default_statistics_target` (default is 100) to 1000 or higher and run `ANALYZE` again to update all table statistics. (continues in the next comment) — Tom Udding, Jul 06 '20 at 09:58
Perhaps the statistics are good but are the planner estimates off. You have already altered the `random_page_cost` (lowering it would be better for the random page requests when using index-only scans), you might want to try and increase `seq_page_cost` to try to force the planner to prefer the index-only scan. And if that does not change anything, try disable sequential scans altogether and see what happens. — Tom Udding, Jul 06 '20 at 09:58
Hi Tom, thanks for giving me some pointers, I have tried increasing the `default_statistics_target` I had not considered that yet but seems logical, however that did not help unfortunately. I've also been playing with `seq_page_cost` but that also doesn't seem to have effect (increasing it to 4 vs 1.1 for random). And even disabling `seq_scan` had not effect (good idea). [Disabling index scans](https://explain.depesz.com/s/ac3F) did have effect but still not good. I did restart the PostgreSQL server in between each config change and re-analyzing the tables to make sure it applied any effect. — Alex, Jul 06 '20 at 10:26
Can you drop the index which it is choosing to use in the bad plan, and show use the EXPLAIN for the query? The question is, is this an estimation problem (it thinks the wrong index is cheaper) or something else (it thinks the right index is unusable at any cost). — jjanes, Jul 06 '20 at 14:54
Good idea! I've dropped the `ctle_e` index (which I forgot to show in my question) and now it uses the correct strategy and index: https://explain.depesz.com/s/dpiI. Now is the question why it thinks that one is more efficient because that index only contains `entry_id` (which it does search `max()` of) but since it's in a join I would have expected that a `ct_log_id, entry_id DESC` index (`ct_log_id` being the join key column) to be more efficient either way. — Alex, Jul 06 '20 at 17:07

PostgresQL query using index scan backward instead of index only scan

0 Answers0