Question
My query (simple with a few joins) runs extremely slow when I have a small amount of data (~50k rows) but runs fast when I have a bigger amount of data (~180k rows). The time difference is huge since it is from a few seconds to almost half an hour.
Attempts
I have re-checked the joins and they are all correct. In addition, I have run a VACUUM ANALYZE
to the table before running the query but it didn't solve anything. I also checked whether there were locks that were blocking the query in any way or the connectivity was in any case slow but they are not the case of the faults.
Therefore, I went to check the output of the EXPLAIN
. After reading the outcome, I see that in the slow case, it makes unnecessary extra sortings and it gets stuck in a nested for loop that is non existent in the case where I have way more data. I'm not sure how to tell postgres to do the same plan as with the bigger dataset scenario.
Based on a comment, I also tried not to use CTEs but it is not helping either: still makes the nested loops and the sortings.
Details:
- Postgres version:
PostgreSQL 12.3
- Full query text:
WITH t0 AS (SELECT * FROM original_table WHERE id=0),
t1 AS (SELECT * FROM original_table WHERE id=1),
t2 AS (SELECT * FROM original_table WHERE id=2),
t3 AS (SELECT * FROM original_table WHERE id=3),
t4 AS (SELECT * FROM original_table WHERE id=4)
SELECT
t0.dtime,
t1.dtime,
t3.dtime,
t3.dtime::date,
t4.dtime,
t1.first_id,
t1.field,
t1.second_id,
t1.third_id,
t2.fourth_id,
t4.fourth_id
FROM t1
LEFT JOIN t0 ON t1.first_id=t0.first_id
JOIN t2 ON t1.first_id=t2.first_id AND t1.second_id = t2.second_id AND t1.third_id = t2.third_id
JOIN t3 ON t1.first_id=t3.first_id AND t1.second_id = t3.second_id AND t1.third_id = t3.third_id
JOIN t4 ON t1.first_id=t4.first_id AND t1.second_id = t4.second_id AND t1.fourth_id= t4.third_id
ORDER BY t3.dtime
;
- Table definition:
Column | Type
----------+----------------------------
id | smallint
dtime | timestamp without time zone
first_id | character varying(10)
second_id | character varying(10)
third_id | character varying(10)
fourth_id | character varying(10)
field | character varying(10)
- Cardinality: slow case ~50k, fast case ~180k
- Query plans: output of
EXPLAIN (BUFFERS, ANALYZE)
for the two cases - slow case https://explain.depesz.com/s/5JDw, fast case: https://explain.depesz.com/s/JMIL - Additional info: the relevant memory configuration is:
name | current_setting | source
---------------+-----------------+---------------------
max_stack_dept | 2MB | environment variable
max_wal_size | 1GB | configuration file
min_wal_size | 80MB | configuration file
shared_buffers | 128MB | configuration file