I have a complex query:
SELECT DISTINCT ON (delivery.id)
delivery.id, dl_processing.pid
FROM mailer.mailer_message_recipient_rel AS delivery
JOIN mailer.mailer_message AS message ON delivery.message_id = message.id
JOIN mailer.mailer_message_recipient_rel_log AS dl_processing ON dl_processing.rel_id = delivery.id AND dl_processing.status = 1000
-- LEFT JOIN mailer.mailer_recipient AS r ON delivery.email = r.email
JOIN mailer.mailer_mailing AS mailing ON message.mailing_id = mailing.id
WHERE
NOT EXISTS (SELECT dl_finished.id FROM mailer.mailer_message_recipient_rel_log AS dl_finished WHERE dl_finished.rel_id = delivery.id AND dl_finished.status <> 1000) AND
dl_processing.date <= NOW() - (36000 * INTERVAL '1 second') AND
NOT EXISTS (SELECT ml.id FROM mailer.mailer_message_log AS ml WHERE ml.message_id = message.id) AND
-- (r.times_bounced < 5 OR r.times_bounced IS NULL) AND
NOT EXISTS (SELECT ur.id FROM mailer.mailer_unsubscribed_recipient AS ur WHERE ur.email = delivery.email AND ur.list_id = mailing.list_id)
ORDER BY delivery.id, dl_processing.id DESC
LIMIT 1000;
It is running very slowly, and the reason seems to be that Postgres is consistently avoiding using merge joins in its query plan despite me having all the indices that I would need for this. It looks really depressing:
http://explain.depesz.com/s/tVY
https://i.stack.imgur.com/Myw4R.png
Why would this happen? How do I troubleshoot such an issue?
UPD: with @wildplasser's help I have reworked the query to fix performance (while changing its semantics somewhat):
SELECT delivery.id, dl_processing.pid
FROM mailer.mailer_message_recipient_rel AS delivery
JOIN mailer.mailer_message AS message ON delivery.message_id = message.id
JOIN mailer.mailer_message_recipient_rel_log AS dl_processing ON dl_processing.rel_id = delivery.id AND dl_processing.status in (1000, 2, 5) AND dl_processing.date <= NOW() - (36000 * INTERVAL '1 second')
LEFT JOIN mailer.mailer_recipient AS r ON delivery.email = r.email
WHERE
(r.times_bounced < 5 OR r.times_bounced IS NULL) AND
NOT EXISTS (SELECT dl_other.id FROM mailer.mailer_message_recipient_rel_log AS dl_other WHERE dl_other.rel_id = delivery.id AND dl_other.id > dl_processing.id) AND
NOT EXISTS (SELECT ml.id FROM mailer.mailer_message_log AS ml WHERE ml.message_id = message.id) AND
NOT EXISTS (SELECT ur.id FROM mailer.mailer_unsubscribed_recipient AS ur JOIN mailer.mailer_mailing AS mailing ON message.mailing_id = mailing.id WHERE ur.email = delivery.email AND ur.list_id = mailing.list_id)
ORDER BY delivery.id
LIMIT 1000
It now runs well, but the query plan still sports these horrible nested loop joins <_<:
http://explain.depesz.com/s/MTo3
I would still like to know why that is.