Overview
I'm trying to filter on one of two columns depending on the result of a subquery. However, on analysing the queries it shows that the subquery values are evaulated too late - leading to less efficient index or full table scans.
Examples
The query needs to perform a computation once (in this case I'm using common table expressions). Then based on the result of that computation one of two columns should be used in the condition.
I've attempted to illustrate this below in the following toy examples.
What I Want To Work
WITH "polygon" AS (SELECT ST_SetSRID(ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[-0.004,0.004],[0.004,0.004],[0.004,-0.004],[-0.004,-0.004],[-0.004,0.004]]]}'), 4326) AS "polygon"),
"geometry" AS (SELECT ST_Transform("polygon", 900913) AS "geometry", ST_Area("polygon"::GEOGRAPHY) AS "area" FROM "polygon"),
"parameters" AS (
SELECT "area" > 1000000 AS "use_geom_alt"
FROM "geometry"
)
SELECT r.*
FROM "record_record" "r"
WHERE ("record_date" >= '2000-01-01' AND "record_date" <= '2019-12-31')
AND (
--
(NOT (SELECT use_geom_alt FROM parameters) AND ST_Intersects((SELECT geometry FROM geometry), r.geom))
OR ((SELECT use_geom_alt FROM parameters) AND ST_Intersects((SELECT geometry FROM geometry), r.geom_alt))
);
In the above example I'm attempting to short circuit the condition based on the value of use_geom_alt
. The value of use_geom_alt
is FALSE
so ST_Intersects((SELECT geometry FROM geometry), r.geom)
would be evaluated - and preferably using it's index.
(NOT (SELECT use_geom_alt FROM parameters) AND ST_Intersects((SELECT geometry FROM geometry), r.geom))
OR ((SELECT use_geom_alt FROM parameters) AND ST_Intersects((SELECT geometry FROM geometry), r.geom_alt))
However, as can be seen in the query plan below the subquery is evaluated too late to be of use and a less efficient (record_date
) index scan is used.
QUERY PLAN
Gather (cost=1000.54..31522732.07 rows=142 width=4734) (actual time=0.702..897.763 rows=18100 loops=1)
Workers Planned: 2
"Params Evaluated: $2, $3, $4, $5"
Workers Launched: 2
CTE geometry
-> Result (cost=0.00..0.01 rows=1 width=40) (actual time=0.001..0.002 rows=1 loops=1)
CTE parameters
-> CTE Scan on geometry (cost=0.00..0.02 rows=1 width=1) (actual time=0.004..0.005 rows=1 loops=1)
InitPlan 3 (returns $2)
-> CTE Scan on parameters (cost=0.00..0.02 rows=1 width=1) (actual time=0.005..0.006 rows=1 loops=1)
InitPlan 4 (returns $3)
-> CTE Scan on geometry geometry_1 (cost=0.00..0.02 rows=1 width=32) (actual time=0.000..0.001 rows=1 loops=1)
InitPlan 5 (returns $4)
-> CTE Scan on parameters parameters_1 (cost=0.00..0.02 rows=1 width=1) (actual time=0.000..0.000 rows=1 loops=1)
InitPlan 6 (returns $5)
-> CTE Scan on geometry geometry_2 (cost=0.00..0.02 rows=1 width=32) (actual time=0.000..0.001 rows=1 loops=1)
-> Parallel Index Scan using record_record_record_date_idx on record_record r (cost=0.43..31521717.75 rows=59 width=4734) (actual time=0.103..857.236 rows=6033 loops=3)
Index Cond: ((record_date >= '2000-01-01'::date) AND (record_date <= '2019-12-31'::date))
"Filter: (((NOT $2) AND st_intersects($3, geom)) OR ($4 AND st_intersects($5, geom_alt)))"
Rows Removed by Filter: 470543
Planning Time: 0.427 ms
Execution Time: 898.936 ms
How I Wanted It To Work
The following query is the same as above except the use of the (SELECT use_geom_alt FROM parameters)
subquery has been replaced with boolean literals to explicitly short circuit the condition.
WITH "polygon" AS (SELECT ST_SetSRID(ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[-0.004,0.004],[0.004,0.004],[0.004,-0.004],[-0.004,-0.004],[-0.004,0.004]]]}'), 4326) AS "polygon"),
"geometry" AS (SELECT ST_Transform("polygon", 900913) AS "geometry", ST_Area("polygon"::GEOGRAPHY) AS "area" FROM "polygon"),
"parameters" AS (
SELECT "area" > 1000000 AS "use_geom_alt"
FROM "geometry"
)
SELECT r.*
FROM "record_record" "r"
WHERE ("record_date" >= '2000-01-01' AND "record_date" <= '2019-12-31')
AND (
(TRUE AND ST_Intersects((SELECT geometry FROM geometry), r.geom))
OR (FALSE AND ST_Intersects((SELECT geometry FROM geometry), r.geom_alt))
);
This results in the desired query plan.
QUERY PLAN
Index Scan using record_record_geom_idx on record_record r (cost=0.44..5754.90 rows=142 width=4734) (actual time=0.076..53.885 rows=18100 loops=1)
Index Cond: (geom && $1)
"Filter: ((record_date >= '2000-01-01'::date) AND (record_date <= '2019-12-31'::date) AND st_intersects($1, geom))"
Rows Removed by Filter: 6620
CTE geometry
-> Result (cost=0.00..0.01 rows=1 width=40) (actual time=0.001..0.001 rows=1 loops=1)
InitPlan 2 (returns $1)
-> CTE Scan on geometry (cost=0.00..0.02 rows=1 width=32) (actual time=0.003..0.004 rows=1 loops=1)
Planning Time: 0.370 ms
Execution Time: 54.783 ms
In Conclusion
- I would like know to if what I want to achieve is at all possible?
- Alternatively, I realise I may be asking the wrong question and I would be interested in hearing about alternative approaches.