I have a somewhat complex (to me) query where I am joining three tables. I have been steadily trying to optize it, reading how to improve things by looking at the EXPLAIN output.
One of the tables person_deliveries
is growing by one to two million records per day, so the query is taking longer and longer due to my poor optimization. Any insight would be GREATLY appreciated.
Here is the query:
SELECT
DATE(pdel.date) AS date,
pdel.ip_address AS ip_address,
pdel.sending_campaigns_id AS campaigns_id,
(substring_index(pe.email, '@', -1)) AS recipient_domain,
COUNT(DISTINCT(concat(pdel.emails_id, pdel.date))) AS deliveries,
COUNT(CASE WHEN pdel.ip_address = pc.ip_address AND pdel.sending_campaigns_id = pc.campaigns_id AND pdel.emails_id = pc.emails_id THEN pdel.emails_id ELSE NULL END) AS complaints
FROM
person_deliveries pdel
LEFT JOIN person_complaints pc on pc.ip_address = pdel.ip_address
LEFT JOIN person_emails pe ON pe.id = pdel.emails_id
WHERE
(pdel.date >= '2022-03-11' AND pdel.date <= '2022-03-12')
AND pe.id IS NOT NULL
AND pdel.ip_address is NOT NULL
GROUP BY date(pdel.date), pdel.ip_address, pdel.sending_campaigns_id
ORDER BY date(pdel.date), INET_ATON(pdel.ip_address), pdel.sending_campaigns_id ASC ;
Here is the output of EXPLAIN
:
+----+-------------+-------+------------+--------+------------------------------------------------+------------+---------+----------------------------+---------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+------------------------------------------------+------------+---------+----------------------------+---------+----------+---------------------------------------------------------------------+
| 1 | SIMPLE | pdel | NULL | range | person_campaign_date,ip_address,date,emails_id | date | 5 | NULL | 2333678 | 50.00 | Using index condition; Using where; Using temporary; Using filesort |
| 1 | SIMPLE | pe | NULL | eq_ref | PRIMARY | PRIMARY | 4 | subscriber.pdel.emails_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | pc | NULL | ref | ip_address | ip_address | 18 | subscriber.pdel.ip_address | 128 | 100.00 | NULL |
+----+-------------+-------+------------+--------+------------------------------------------------+------------+---------+----------------------------+---------+----------+---------------------------------------------------------------------+
I added a few indexes to get it to this point, but the query still takes an extraordinary amount of resources/time to process.
I know I am missing something here, either an index or using a function that is causing it to be slow, but from everything I have read I haven't figured it out yet.
UPDATE: I neglected to include table info, so I am providing that to be more helpful.
person_deliveries:
CREATE TABLE `person_deliveries` (
`emails_id` int unsigned NOT NULL,
`sending_campaigns_id` int NOT NULL,
`date` datetime NOT NULL,
`vmta` varchar(255) DEFAULT NULL,
`ip_address` varchar(15) DEFAULT NULL,
`sending_domain` varchar(255) DEFAULT NULL,
UNIQUE KEY `person_campaign_date` (`emails_id`,`sending_campaigns_id`,`date`),
KEY `ip_address` (`ip_address`),
KEY `sending_domain` (`sending_domain`),
KEY `sending_campaigns_id` (`sending_campaigns_id`),
KEY `date` (`date`),
KEY `emails_id` (`emails_id`)
person_complaints:
CREATE TABLE `person_complaints` (
`emails_id` int unsigned NOT NULL,
`campaigns_id` int unsigned NOT NULL,
`complaint_datetime` datetime DEFAULT NULL,
`added_datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`ip_address` varchar(15) DEFAULT NULL,
`networks_id` int DEFAULT NULL,
`mailing_domains_id` int DEFAULT NULL,
UNIQUE KEY `email_campaign_date` (`emails_id`,`campaigns_id`,`complaint_datetime`),
KEY `ip_address` (`ip_address`)
person_emails:
CREATE TABLE `person_emails` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`data_providers_id` tinyint unsigned DEFAULT NULL,
`email` varchar(255) NOT NULL,
`email_md5` varchar(255) DEFAULT NULL,
`original_import` timestamp NULL DEFAULT NULL,
`last_import` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`),
KEY `data_providers_id` (`data_providers_id`),
KEY `email_md5` (`email_md5`)
Hopefully this extra info helps.