I am running a query with an INNER JOIN where the planner decides to use a Nested Loop. I've figured out that it has do with the WHERE conditions as I have tried writing the query with different WHERE conditions so it returns the same result but does not use a Nested Loop.
My question is why has the planner decided to make the different decisions when the queries appear to be identical as they both return the same result? The query runs in 77 secs with the Nested Loop and in 13 sec without, and the query that runs in 13 sec is quite ugly and inelegant making me think there is a better way to write it.
Here are the two queries. Note that the difference between the two is how the WHERE clause filters by date where the first uses BETWEEN and the second uses a series of OR statements. I am aware that it's strange that current_date is wrapped in their own subqueries but that is because these queries are using foreign data wrappers. This allows current_date to be passed as an immutable object to greatly speed up performance.
SELECT ROUND(AVG(m.forecast - w.wind),6) from pjm.wind_forecast_recent w
INNER JOIN pjm.load_forecast_recent m ON w.pricedate = m.pricedate AND w.hour = m.hour
WHERE w.hour = 5 AND m.area = 'RTO_COMBINED' AND
(w.pricedate BETWEEN (SELECT current_date-6) AND (SELECT current_date));
-----------
SELECT ROUND(AVG(m.forecast - w.wind),6) from pjm.wind_forecast_recent w
INNER JOIN pjm.load_forecast_recent m ON w.pricedate = m.pricedate AND w.hour = m.hour
WHERE w.hour = 5 AND m.area = 'RTO_COMBINED' AND (
w.pricedate = (SELECT current_date-6) OR
w.pricedate = (SELECT current_date-5) OR
w.pricedate = (SELECT current_date-4) OR
w.pricedate = (SELECT current_date-3) OR
w.pricedate = (SELECT current_date-2) OR
w.pricedate = (SELECT current_date-1) OR
w.pricedate = (SELECT current_date))
And here are the respective EXPLAIN ANALYZE:
Aggregate (cost=842341.01..842341.02 rows=1 width=32) (actual time=77120.088..77120.089 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=1)
InitPlan 2 (returns $1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
-> Nested Loop (cost=840333.25..842340.97 rows=1 width=18) (actual time=14719.661..77119.994 rows=7 loops=1)
-> Foreign Scan on wind_forecast_recent w (cost=242218.45..242218.49 rows=1 width=18) (actual time=3184.714..3184.720 rows=7 loops=1)
-> Foreign Scan on load_forecast_recent m (cost=598114.80..600122.47 rows=1 width=16) (actual time=10531.723..10531.724 rows=1 loops=7)
Planning Time: 744.979 ms
Execution Time: 77227.512 ms
Aggregate (cost=841657.94..841657.95 rows=1 width=32) (actual time=13683.022..13683.023 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1)
InitPlan 2 (returns $1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
InitPlan 3 (returns $2)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
InitPlan 4 (returns $3)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
InitPlan 5 (returns $4)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
InitPlan 6 (returns $5)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
InitPlan 7 (returns $6)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
-> Foreign Scan (cost=833725.15..841657.83 rows=1 width=18) (actual time=13682.974..13682.977 rows=7 loops=1)
Relations: (pjm.wind_forecast_recent w) INNER JOIN (pjm.load_forecast_recent m)
Planning Time: 332.870 ms
JIT:
Functions: 16
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 4.163 ms, Inlining 15.088 ms, Optimization 44.489 ms, Emission 28.064 ms, Total 91.804 ms
Execution Time: 13724.094 ms
I am running PostgreSQL 12.1 on an Ubuntu 18.04 server.
Let me know if you have any further questions. Thanks!