Bad query plan for small dataset makes it very slow

Question

Question

My query (simple with a few joins) runs extremely slow when I have a small amount of data (~50k rows) but runs fast when I have a bigger amount of data (~180k rows). The time difference is huge since it is from a few seconds to almost half an hour.

Attempts

I have re-checked the joins and they are all correct. In addition, I have run a VACUUM ANALYZE to the table before running the query but it didn't solve anything. I also checked whether there were locks that were blocking the query in any way or the connectivity was in any case slow but they are not the case of the faults.

Therefore, I went to check the output of the EXPLAIN. After reading the outcome, I see that in the slow case, it makes unnecessary extra sortings and it gets stuck in a nested for loop that is non existent in the case where I have way more data. I'm not sure how to tell postgres to do the same plan as with the bigger dataset scenario.

Based on a comment, I also tried not to use CTEs but it is not helping either: still makes the nested loops and the sortings.

Details:

Postgres version: PostgreSQL 12.3
Full query text:

WITH t0 AS (SELECT * FROM original_table WHERE id=0),
     t1 AS (SELECT * FROM original_table WHERE id=1),
     t2 AS (SELECT * FROM original_table WHERE id=2),
     t3 AS (SELECT * FROM original_table WHERE id=3),
     t4 AS (SELECT * FROM original_table WHERE id=4)
 
SELECT
    t0.dtime,
    t1.dtime,
    t3.dtime,
    t3.dtime::date,
    t4.dtime,
    t1.first_id,
    t1.field,
    t1.second_id,
    t1.third_id,
    t2.fourth_id,
    t4.fourth_id
FROM t1
LEFT JOIN t0 ON t1.first_id=t0.first_id
     JOIN t2 ON t1.first_id=t2.first_id AND t1.second_id = t2.second_id AND t1.third_id = t2.third_id
     JOIN t3 ON t1.first_id=t3.first_id AND t1.second_id = t3.second_id AND t1.third_id = t3.third_id
     JOIN t4 ON t1.first_id=t4.first_id AND t1.second_id = t4.second_id AND t1.fourth_id= t4.third_id
ORDER BY t3.dtime
;

Table definition:

Column    |            Type
----------+----------------------------
id        | smallint
dtime     | timestamp without time zone
first_id  | character varying(10)
second_id | character varying(10)
third_id  | character varying(10)
fourth_id | character varying(10)
field     | character varying(10)

Cardinality: slow case ~50k, fast case ~180k
Query plans: output of EXPLAIN (BUFFERS, ANALYZE) for the two cases - slow case https://explain.depesz.com/s/5JDw, fast case: https://explain.depesz.com/s/JMIL
Additional info: the relevant memory configuration is:

      name     | current_setting |        source
---------------+-----------------+---------------------
max_stack_dept | 2MB             | environment variable
max_wal_size   | 1GB             | configuration file
min_wal_size   | 80MB            | configuration file
shared_buffers | 128MB           | configuration file

What if you don't use CTEs in this case? They aren't really needed for this — , Sep 15 '20 at 07:58
Originally I had views t0,...,t4 but I don't really see how to do the query without CTEs or creating views. — Jzbach, Sep 15 '20 at 08:20
Oh, how could I be so blind hahahah! But no, it doesn't help. It still has the same nested loops and takes forever... — Jzbach, Sep 15 '20 at 09:22

score 1 · Accepted Answer · answered Sep 15 '20 at 09:54

This often happens to me on SQL-Server.
What usually causes the slowness, is that it executes the CTE once per row joined.

You can prevent that from happening by selecting into temp-tables, instead of using CTEs.
I assume the same is true for PostgreSQL, but I didn't test it:

DROP TABLE IF EXISTS tempT0;
DROP TABLE IF EXISTS tempT1;
CREATE TEMP TABLE tempT0 AS SELECT * FROM original_table WHERE id=0; 
CREATE TEMP TABLE tempT1 AS SELECT * FROM original_table WHERE id=1; 
[... etc]

FROM tempT1 AS t1
LEFT JOIN tempT0 AS t0 ON  t1.first_id=t0.first_id

Thank you very much Stefan. I suspected that with more data, the temp tables were created on the background but I didn't think about explicitly creating them when using not so much data. It completely solved my issue! :) — Jzbach, Sep 15 '20 at 11:59

Bad query plan for small dataset makes it very slow

Question

Attempts

Details:

1 Answers1