I have two queries that when run separately, each take 1-10 seconds to execute. When I combine their results with UNION ALL
, the execution time jumps up to 4500 seconds (over 1 hour)!
I have run the long query with EXPLAIN ANALYZE
and it appears to iterate over man_jour_tracking
1.6B times - even though that table is only ~11k lines long. It looks to be creating man_jour_tracking
on each join -- but only when the UNION is present, not when run on it's own. Explain Analyze Output
I do not know if it has to do with the json_object
data type. But somehow the planner is not handling this correctly since they execute on their own and all records will be unique.
(I have taken out a variety of select columns to shorten the query since they are not relevant). PostgreSQL 13.7
Running this query alone (first SELECT from union query) take ~5 seconds.
WITH most_recent_journum as (
SELECT
max(journal_number) as journum,
max(journal_date) as jourdate,
max(source_type)
FROM journal
GROUP BY source_id
),
man_jour_tracking as (
SELECT
json_object(Array_agg(tracking_category.name), array_agg(option)) as tracking,
manual_journal_line_id,
manual_journal_id
FROM
manual_journal_line_has_tracking_category line_tracking
LEFT JOIN tracking_category on line_tracking.tracking_category_id = tracking_category.tracking_category_id
GROUP BY manual_journal_line_id, manual_journal_id
)
SELECT
journal.journal_id as journal_id,
journal.journal_number as journal_number,
gross_amount as gross_amount,
line.account_code as account_code,
line.account_name as account_name,
line.account_type as account_type,
line.description as description,
manual_journal.status as status,
trim('"' FROM (man_jour_tracking.tracking -> 'Tracking')::text) as tracking,
trim('"' FROM (man_jour_tracking.tracking -> 'Events Tracking')::text) as events_tracking,
NULL as invoice_number,
NULL as contact_name,
manual_journal.narration as source_description
FROM journal
LEFT JOIN journal_line line on line.journal_id = journal.journal_id
LEFT JOIN manual_journal ON journal.source_id = manual_journal.manual_journal_id
LEFT JOIN manual_journal_line manual_line ON
manual_line.manual_journal_id = manual_journal.manual_journal_id AND line.account_code = manual_line.account_code AND line.description = manual_line.description AND line.gross_amount = manual_line.line_amount
LEFT JOIN
man_jour_tracking ON
manual_journal.manual_journal_id = man_jour_tracking.manual_journal_id AND man_jour_tracking.manual_journal_line_id::int = manual_line.line
WHERE
journal.source_type = 'MANJOURNAL' AND
journal.journal_number in (select journum from most_recent_journum) AND
journal.journal_date AT TIME ZONE 'UTC' >= '2022-10-01'::date AND journal_date AT TIME ZONE 'UTC' < '2023-01-01'::date AND
line.journal_line_id is not null AND
line.account_code <> '2650' AND
manual_journal.status = 'POSTED'
Running this query alone (second SELECT from union query) takes ~5 seconds
WITH most_recent_journum as (
SELECT
max(journal_number) as journum,
max(journal_date) as jourdate,
max(source_type)
FROM journal
GROUP BY source_id
),
man_jour_tracking as (
SELECT
json_object(Array_agg(tracking_category.name), array_agg(option)) as tracking,
manual_journal_line_id,
manual_journal_id
FROM
manual_journal_line_has_tracking_category line_tracking
LEFT JOIN tracking_category on line_tracking.tracking_category_id = tracking_category.tracking_category_id
GROUP BY manual_journal_line_id, manual_journal_id
)
SELECT
journal.journal_id as journal_id,
journal.journal_number as journal_number,
NULL::int as source_line,
gross_amount as gross_amount,
line.account_code as account_code,
line.account_name as account_name,
line.account_type as account_type,
line.description as description,
invoice.status as status,
NULL as tracking,
NULL as events_tracking,
invoice.invoice_number,
contact.name as contact_name,
NULL as source_description
FROM journal
LEFT JOIN journal_line line on line.journal_id = journal.journal_id
LEFT JOIN invoice ON journal.source_id = invoice.invoice_id
LEFT JOIN contact on invoice.contact_id = contact.contact_id
WHERE
journal.source_type in ('ACCPAY', 'ACCREC') AND
journal.journal_number in (select journum from most_recent_journum) AND
journal.journal_date AT TIME ZONE 'UTC' >= '2022-10-01'::date AND journal_date AT TIME ZONE 'UTC' < '2023-01-01'::date AND
line.journal_line_id is not null AND
invoice.status not in ('VOIDED', 'DELETED')
But when you combine them together, the query takes over 1 hour.
WITH most_recent_journum as (
SELECT
max(journal_number) as journum,
max(journal_date) as jourdate,
max(source_type)
FROM journal
GROUP BY source_id
),
man_jour_tracking as (
SELECT
json_object(Array_agg(tracking_category.name), array_agg(option)) as tracking,
manual_journal_line_id,
manual_journal_id
FROM
manual_journal_line_has_tracking_category line_tracking
LEFT JOIN tracking_category on line_tracking.tracking_category_id = tracking_category.tracking_category_id
GROUP BY manual_journal_line_id, manual_journal_id
)
SELECT
journal.journal_id as journal_id,
journal.journal_number as journal_number,
gross_amount as gross_amount,
line.account_code as account_code,
line.account_name as account_name,
line.account_type as account_type,
line.description as description,
manual_journal.status as status,
trim('"' FROM (man_jour_tracking.tracking -> 'Tracking')::text) as tracking,
trim('"' FROM (man_jour_tracking.tracking -> 'Events Tracking')::text) as events_tracking,
NULL as invoice_number,
NULL as contact_name,
manual_journal.narration as source_description
FROM journal
LEFT JOIN journal_line line on line.journal_id = journal.journal_id
LEFT JOIN manual_journal ON journal.source_id = manual_journal.manual_journal_id
LEFT JOIN manual_journal_line manual_line ON
manual_line.manual_journal_id = manual_journal.manual_journal_id AND line.account_code = manual_line.account_code AND line.description = manual_line.description AND line.gross_amount = manual_line.line_amount
LEFT JOIN
man_jour_tracking ON
manual_journal.manual_journal_id = man_jour_tracking.manual_journal_id AND man_jour_tracking.manual_journal_line_id::int = manual_line.line
WHERE
journal.source_type = 'MANJOURNAL' AND
journal.journal_number in (select journum from most_recent_journum) AND
journal.journal_date AT TIME ZONE 'UTC' >= '2022-10-01'::date AND journal_date AT TIME ZONE 'UTC' < '2023-01-01'::date AND
line.journal_line_id is not null AND
line.account_code <> '2650' AND
manual_journal.status = 'POSTED'
UNION ALL
SELECT
journal.journal_id as journal_id,
journal.journal_number as journal_number,
NULL::int as source_line,
gross_amount as gross_amount,
line.account_code as account_code,
line.account_name as account_name,
line.account_type as account_type,
line.description as description,
invoice.status as status,
NULL as tracking,
NULL as events_tracking,
invoice.invoice_number,
contact.name as contact_name,
NULL as source_description
FROM journal
LEFT JOIN journal_line line on line.journal_id = journal.journal_id
LEFT JOIN invoice ON journal.source_id = invoice.invoice_id
LEFT JOIN contact on invoice.contact_id = contact.contact_id
WHERE
journal.source_type in ('ACCPAY', 'ACCREC') AND
journal.journal_number in (select journum from most_recent_journum) AND
journal.journal_date AT TIME ZONE 'UTC' >= '2022-10-01'::date AND journal_date AT TIME ZONE 'UTC' < '2023-01-01'::date AND
line.journal_line_id is not null AND
invoice.status not in ('VOIDED', 'DELETED')
Any help is much appreciated!
Things I have tried:
- Separated
man_jour_tracking
out from a CTE into a temp table (CREATE TEMP TABLE man_jour_tracking
) and created an index (CREATE INDEX idx_man_jour_tracking ON man_jour_tracking (manual_journal_id, manual_journal_line_id)
) in hopes of speeding up the join or forcing the planner to cache the joined table. No faster. - Switched order of SELECT statements within union
- Replacing
UNION ALL
withUNION
- no help. - Even remove
man_jour_tracking
CTE entirely and still takes ages longer when combined than when separate.
EDIT - SOLVED
Thanks to @jjanes, creating the CTE's with as NOT MATERIALIZED
fixed it! I do think it's a bit counter-intuitive because you would want the CTE cached, however perhaps it removes a critical index that substantially slows down the joins. Another post here for more context.