-1

How can I optimize this query

WITH stats AS (SELECT a.IntegratorSalesAssociateID,
                      a.AgentName,
                      (
                          SELECT COUNT(*)
                          FROM properties AS p
                          WHERE a.IntegratorSalesAssociateID = p.IntegratorSalesAssociateID
                            AND p.TransactionType = '2'
                            AND MONTH(p.OrigListingDate) = MONTH(CURRENT_DATE)
                            AND YEAR(p.OrigListingDate) = YEAR(CURRENT_DATE)
                      ) AS properties_this_month
               FROM agents AS a)
SELECT stats.*,
       DENSE_RANK() over (ORDER BY stats.properties_this_month DESC) AS 'rank'
from stats

I think maybe if I join the two tables and group them somehow, it would preform much better, currently it runs for 17.5 seconds, oddly, adding the dense_rank does not effect performance at all.

Relevant table structure

CREATE TABLE `agents`
(
    `IntegratorSalesAssociateID` varchar(15) COLLATE utf8mb4_unicode_ci NOT NULL,
    `AgentName`                  varchar(100) COLLATE utf8mb4_unicode_ci     DEFAULT NULL,
    `created_at`                 timestamp                              NULL DEFAULT NULL,
    `updated_at`                 timestamp                              NULL DEFAULT NULL
) ENGINE = InnoDB
  DEFAULT CHARSET = utf8mb4
  COLLATE = utf8mb4_unicode_ci;


CREATE TABLE `properties`
(
    `id`                         bigint(20) UNSIGNED                    NOT NULL,
    `IntegratorSalesAssociateID` varchar(13) COLLATE utf8mb4_unicode_ci NOT NULL,
    `TransactionType`            tinyint(4)                             NOT NULL,
    `OrigListingDate`            date                                        DEFAULT NULL,
    `created_at`                 timestamp                              NULL DEFAULT NULL,
    `updated_at`                 timestamp                              NULL DEFAULT NULL
) ENGINE = InnoDB
  DEFAULT CHARSET = utf8mb4
  COLLATE = utf8mb4_unicode_ci;

phper
  • 307
  • 2
  • 12
  • We know nothing about the underlying table structure, indexes, data or the query plan. How do you think we can optimise this query? – Shadow Feb 09 '21 at 09:14
  • I don't think the actual table structure is important, my question basically is, how to achieve the same result by joining the table and not using a subquery. the actual tables are pretty large and would take over the whole page – phper Feb 09 '21 at 09:17
  • 1
    AND MONTH(p.OrigListingDate) = MONTH(CURRENT_DATE) AND YEAR(p.OrigListingDate) = YEAR(CURRENT_DATE) looks suspicious. I would try torewrite it as p.OrigListingDate BETWEEN AND – Sergey Feb 09 '21 at 09:18
  • Convert correlated subquery to JOIN. Remove CTE usage. – Akina Feb 09 '21 at 09:18
  • 1
    You can include the relevant parts of your tables. – Shadow Feb 09 '21 at 09:19
  • @Sergey, I tried this, query improved by 0.4 seconds... – phper Feb 09 '21 at 09:50
  • Remember, functions cannot use indexes, so on an indexed data set, something like `p.OrigListingDate BETWEEN DATE_FORMAT(CURDATE(),'%Y-%m-01') AND LAST_DAY(CURDATE())` would likely prove more performant (oh, as Sergey already mentioned) – Strawberry Feb 09 '21 at 09:52
  • Indexes... `agents(IntegratorSalesAssociateID, AgentName)` and `properties(TransactionType , IntegratorSalesAssociateID, OrigListingDate)` *(though you should experiment with this as it partially depends on data profile and the variety of queries you're likely to run against it)* – MatBailie Feb 09 '21 at 10:18
  • Sample data, desired results, and an explanation of what the query is supposed to be doing would all help. – Gordon Linoff Feb 09 '21 at 12:45

2 Answers2

0

You can try this:

;WITH stats AS
(
    SELECT 
       p.IntegratorSalesAssociateID
       , COUNT(*) AS properties_this_month
    FROM properties AS p
    WHERE p.TransactionType = '2'
        AND MONTH(p.OrigListingDate) = MONTH(CURRENT_DATE)
        AND YEAR(p.OrigListingDate) = YEAR(CURRENT_DATE)
    GROUP BY p.IntegratorSalesAssociateID
)
SELECT 
   a.IntegratorSalesAssociateID
   , a.AgentName
   , COALESCE(s.properties_this_month, 0) AS properties_this_month
FROM agents AS a
LEFT JOIN stats s ON a.IntegratorSalesAssociateID = s.IntegratorSalesAssociateID
Giorgos Betsos
  • 71,379
  • 9
  • 63
  • 98
0

Given that the DENSE_RANK() doesn't affect performance, you want to optimize:

SELECT a.IntegratorSalesAssociateID,
       a.AgentName,
       (SELECT COUNT(*)
        FROM properties p
        WHERE a.IntegratorSalesAssociateID = p.IntegratorSalesAssociateID AND
              p.TransactionType = '2' AND
              MONTH(p.OrigListingDate) = MONTH(CURRENT_DATE) AND
              YEAR(p.OrigListingDate) = YEAR(CURRENT_DATE)
         ) AS properties_this_month
FROM agents a;

I would rewrite this as:

SELECT a.IntegratorSalesAssociateID,
       a.AgentName,
       (SELECT COUNT(*)
        FROM properties p
        WHERE a.IntegratorSalesAssociateID = p.IntegratorSalesAssociateID AND
              p.TransactionType = 2 AND
              p.OrigListingDate >= CURRENT_DATE - INTERVAL (1 - DAY(CURRENT_DATE) DAY
         ) AS properties_this_month
FROM agents a;

The two changes are:

  • TransactionType looks like a number. Assuming it is, I removed the single quotes. Don't mix data types! Of course, if the column is a string, then use single quotes.
  • I changed the date logic to remove all functions on the column. I assume there are no future "original" dates. If so, you can add another condition for the end of the month.

Then, for this query you want an index on: properties(IntegratorSalesAssociateID, TransactionType, OrigListingDate). Actually, this index might work on the original version of the data.

I sincerely doubt that using an explicit aggregation would improve performance. GROUP BY -- although quite powerful -- is often slower than correlated subqueries. And almost always slower (or at least not faster) with the right indexes.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786