I have PostgreSQL 9.5.9 and two tables: table1 and table2
Column | Type | Modifiers
------------+--------------------------------+-------------------------------------------
id | integer | not null
status | integer | not null
table2_id | integer |
start_date | timestamp(0) without time zone | default NULL::timestamp without time zone
Indexes:
"table1_pkey" PRIMARY KEY, btree (id)
"table1_start_date" btree (start_date)
"table1_table2" btree (table2_id)
Foreign-key constraints:
"fk_t1_t2" FOREIGN KEY (table2_id) REFERENCES table2(id)
Column | Type | Modifiers
--------+-------------------------+---------------------------------
id | integer | not null
name | character varying(2000) | default NULL::character varying
Indexes:
"table2_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "table1" CONSTRAINT "fk_t1_t2" FOREIGN KEY (table2_id) REFERENCES table2(id)
table2 contains only 3 rows; table1 contains about 400000 rows and only half of them have some value in table_2_id column.
The query is fast enough when i select some values from table1 ordered by start_date column because table1_start_date index is effectively used:
SELECT t1.*
FROM table1 AS t1
ORDER BY t1.start_date DESC
LIMIT 25 OFFSET 150000;
EXPLAIN ANALYZE result
Limit (cost=7797.40..7798.70 rows=25 width=20) (actual time=40.994..41.006 rows=25 loops=1)
-> Index Scan Backward using table1_start_date on table1 t1 (cost=0.42..20439.74 rows=393216 width=20) (actual time=0.078..36.251 rows=150025
loops=1)
Planning time: 0.097 ms
Execution time: 41.033 ms
But the query become very slow when i add LEFT JOIN to fetch values from table2 too:
SELECT t1.*, t2.*
FROM table1 AS t1
LEFT JOIN table2 AS t2 ON t2.id = t1.table2_id
ORDER BY t1.start_date DESC
LIMIT 25 OFFSET 150000;
EXPLAIN ANALYZE result:
Limit (cost=33690.80..33696.42 rows=25 width=540) (actual time=191.282..191.320 rows=25 loops=1)
-> Nested Loop Left Join (cost=0.57..88317.50 rows=393216 width=540) (actual time=0.028..184.537 rows=150025 loops=1)
-> Index Scan Backward using table1_start_date on table1 t1 (cost=0.42..20439.74 rows=393216 width=20) (actual time=0.018..35.196 rows=
150025 loops=1)
-> Index Scan using table2_pkey on table2 t2 (cost=0.14..0.16 rows=1 width=520) (actual time=0.000..0.001 rows=1 loops=150025)
Index Cond: (id = t1.table2_id)
Planning time: 0.210 ms
Execution time: 191.357 ms
Why query time increased from 32ms to 191ms? As i understand, LEFT JOIN can not affect to result. So, we can select 25 rows from table1 (LIMIT 25) first and then join rows from table2 Execution time of the query shouldn't significantly increase. There are no some tricky conditions which can break the use of index, etc.
I don't understand completely the EXPLAIN ANALYZE for second query, but it seems like postgres analyzer decided to "perform join and then filter" instead of "filter and then join". In this way the query is too slow. What is the problem?