I'm using postgres version 13.3, and I have multiple views, each constructed with a join between a (foreign) data table and a shared metadata tables that serves as a lookup. each view is constructed like this
CREATE VIEW pg_table_<id>_view AS SELECT
time AS time,
pg_table_<id>.source_id AS source_id,
pg_metadata.source AS source
FROM pg_table_<id>, pg_metadata
WHERE pg_metadata.table_id = (<id>)::INT4 AND pg_metadata.source_id = pg_table_<id>.source_id
pg_table_<id> tables are much larger than pg_metadata table, and when I run a query like
SELECT pg_table_0_view.time, pg_table_0_view.source FROM pg_table_0_view WHERE (pg_table_0_view.source ~ <regex expression>)
It is executed pretty quickly because the regex condition is being "pushed down" into the JOIN filter. here's an example of the query plan constructed:
"QUERY PLAN"
"Subquery Scan on pg_table_0_view (cost=28.01..31.88 rows=5 width=72)"
" Output: pg_table_0_view.time, pg_table_0_view.source"
" -> Hash Join (cost=28.01..31.82 rows=5 width=148)"
" Output: pg_table_0.time, pg_metadata.source"
" Hash Cond: (pg_table_0.source_id = pg_metadata.source_id)"
" -> Foreign Scan on public.pg_table_0 (cost=0.00..0.00 rows=1000 width=16)"
" Output: pg_table_0.time, pg_table_0.source_id"
" -> Hash (cost=28.00..28.00 rows=1 width=36)"
" Output: pg_metadata.source, pg_metadata.source_id"
" -> Seq Scan on public.pg_metadata (cost=0.00..28.00 rows=1 width=36)"
" Output: pg_metadata.source, pg_metadata.source_id"
" Filter: ((pg_metadata.source ~ <regex condition>::text) AND (pg_metadata.table_id = 0))"
But if I run a similar query on a UNION of two such views, then the regex condition isn't being "pushed down", and the query runs much slower. for example such a query:
SELECT pgview9.* FROM (
SELECT query2.* FROM ( (
SELECT pgview3.source, pgview3.time FROM (
SELECT pg_table_0_view.time, pg_table_0_view.source FROM pg_table_0_view
)
AS pgview3
)
UNION ALL (
SELECT pgview5.source, pgview5.time FROM (
SELECT pg_table_1_view.time, pg_table_1_view.source FROM pg_table_1_view
)
AS pgview5
)
) AS query2
) AS pgview9 WHERE (pgview9.source ~ <regex condition>)
results in a query plan that looks something like this:
"QUERY PLAN"
"Subquery Scan on pgview9 (cost=25.07..60.80 rows=1 width=48)"
" Filter: (pgview9.source ~ <regex condition>::text)"
" -> WindowAgg (cost=25.07..60.05 rows=60 width=48)"
" -> Append (cost=25.07..59.30 rows=60 width=40)"
" -> Subquery Scan on pg_table_0_view (cost=25.07..29.50 rows=30 width=40)"
" -> Hash Join (cost=25.07..29.20 rows=30 width=148)"
" Hash Cond: (pg_table_0.source_id = pg_metadata.source_id)"
" -> Foreign Scan on pg_table_0 (cost=0.00..0.00 rows=1000 width=16)"
" -> Hash (cost=25.00..25.00 rows=6 width=36)"
" -> Seq Scan on pg_metadata (cost=0.00..25.00 rows=6 width=36)"
" Filter: (table_id = 0)"
" -> Subquery Scan on pg_table_1_view (cost=25.07..29.50 rows=30 width=40)"
" -> Hash Join (cost=25.07..29.20 rows=30 width=148)"
" Hash Cond: (pg_table_1.source_id = pg_metadata_1.source_id)"
" -> Foreign Scan on pg_table_1 (cost=0.00..0.00 rows=1000 width=16)"
" -> Hash (cost=25.00..25.00 rows=6 width=36)"
" -> Seq Scan on pg_metadata pg_metadata_1 (cost=0.00..25.00 rows=6 width=36)"
" Filter: (table_id = 1)"
I tried to push the condition inside myself (on each part of the union, instead the entire query) and it indeed pushed the regex condition into the join filter as expected, resulting in a query plan like this:
"QUERY PLAN"
"WindowAgg (cost=28.01..64.02 rows=10 width=80)"
" -> Append (cost=28.01..63.90 rows=10 width=72)"
" -> Subquery Scan on pg_table_0_view (cost=28.01..31.88 rows=5 width=72)"
" -> Hash Join (cost=28.01..31.82 rows=5 width=148)"
" Hash Cond: (pg_table_0.source_id = pg_metadata.source_id)"
" -> Foreign Scan on pg_table_0 (cost=0.00..0.00 rows=1000 width=16)"
" -> Hash (cost=28.00..28.00 rows=1 width=36)"
" -> Seq Scan on pg_metadata (cost=0.00..28.00 rows=1 width=36)"
" Filter: ((source ~ <regex condition>::text) AND (table_id = 0))"
" -> Subquery Scan on pg_table_1_view (cost=28.01..31.88 rows=5 width=72)"
" -> Hash Join (cost=28.01..31.82 rows=5 width=148)"
" Hash Cond: (pg_table_1.source_id = pg_metadata_1.source_id)"
" -> Foreign Scan on pg_table_1 (cost=0.00..0.00 rows=1000 width=16)"
" -> Hash (cost=28.00..28.00 rows=1 width=36)"
" -> Seq Scan on pg_metadata pg_metadata_1 (cost=0.00..28.00 rows=1 width=36)"
" Filter: ((source ~ <regex condition>::text) AND (table_id = 1))"
Which runs much faster.
Can I make postgres push the regex condition down into the join filters even if it's defined on the union?