MySQL: Optimize greatest row in group query

Question

I have table Addresses with following schema:

CREATE TABLE `Address` (
    ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    ID_Person INT NOT NULL,
    Street VARCHAR(50),
    HouseNumber VARCHAR(10),
    City VARCHAR(50),
    Zipcode CHAR(5),
    Country CHAR(2),
    Version INT,
    ValidFrom DATE,
    ValidTo DATE,
);

Each address belongs to user (ID_Person) and each user can have multiple addreses. Each address has Version, which is INT, one user can have multiple addresses with same version. I want to retrieve address with the highest Version for each user. In case of multiple highest versions for one user, address can be chosen randomly from highest values.

What I have done, is following:

I get highest version of address for each user:

CREATE VIEW vw_highest_addresses AS
SELECT 
    ID,
    ID_Person,
    Max(Version) AS Version
FROM
    Address
WHERE ValidTo IS NULL OR ValidTo < NOW()
GROUP BY ID_Person

And then join addreses based on combination of ID_Person and Version

CREATE VIEW vw_addresses AS
SELECT 
    a.ID,
    a.ID_Person,
    a.Street,
    a.HouseNumber,
    a.City,
    a.Zipcode,
    a.Country,
    a.Version
FROM
    Address AS a, vw_primary_addresses AS aprimary
WHERE
    a.ID_Person = aprimary.ID_Person
  AND a.Version = aprimary.Version
  AND a.ValidTo IS NULL OR a.ValidTo < NOW()
GROUP BY a.ID_Person;

I have this index:

CREATE INDEX idx_addresses ON Address(ID_Person, IsPrimary, ValidTo)

This provides correct results, but is very slow (6000 row in addresses takes 6 sec to perform query)

Here is query execution plan:

id  select_type     table   type    possible_keys   key             key_len     ref     rows    Extra   
1   PRIMARY     <derived2>  ALL     NULL            NULL            NULL        NULL    5936    
2   DERIVED     a           ALL     idx_addresses   NULL            NULL        NULL    5565    Using where; Using temporary; Using filesort
2   DERIVED     <derived3>  ALL     NULL            NULL            NULL        NULL    5936    Using where; Using join buffer
3   DERIVED     a           index   NULL            idx_addresses   10          NULL    5565    Using where; Using index

How can I optimize the query?

But no two people can share the same address? What a lonely little place — Strawberry, Dec 26 '17 at 22:50

score 2 · Accepted Answer · edited Dec 26 '17 at 20:19

Your query is not correct because of the precedence of AND and OR. AND hash higher precedence, so it's being treated as if you'd written.

CREATE VIEW vw_addresses AS
SELECT 
    a.ID,
    a.ID_Person,
    a.Street,
    a.HouseNumber,
    a.City,
    a.Zipcode,
    a.Country,
    a.Version
FROM
    Address AS a, vw_primary_addresses AS aprimary
WHERE
    (a.ID_Person = aprimary.ID_Person
        AND a.Version = aprimary.Version
        AND a.ValidTo IS NULL)
    OR a.ValidTo < NOW()
GROUP BY a.ID_Person;

So everyone with ValidTo < NOW() is being returned as a cross-product in addition to the ones that match the joining condition.

It would be better if you wrote it as an explicit JOIN, then you wouldn't run into this problem.

CREATE VIEW vw_addresses AS
SELECT 
    a.ID,
    a.ID_Person,
    a.Street,
    a.HouseNumber,
    a.City,
    a.Zipcode,
    a.Country,
    a.Version
FROM
    Address AS a
JOIN 
    vw_primary_addresses AS aprimary
ON
    a.ID_Person = aprimary.ID_Person
  AND a.Version = aprimary.Version
WHERE 
  a.ValidTo IS NULL OR a.ValidTo < NOW()
GROUP BY a.ID_Person;

Note that if there's multiple addresses with the same version, your SELECT might select different columns from different addresses. To solve that you need another level of subquery.

CREATE VIEW vw_addresses_id AS
SELECT 
    MAX(a.ID) AS id
FROM
    Address AS a
JOIN 
    vw_primary_addresses AS aprimary
ON
    a.ID_Person = aprimary.ID_Person
  AND a.Version = aprimary.Version
WHERE 
  a.ValidTo IS NULL OR a.ValidTo < NOW()
GROUP BY a.ID_Person;

CREATE VIEW vw_addresses AS
SELECT a.*
FROM Address AS a
JOIN vw_addresses_id AS aprimary
ON a.ID = aprimary.ID

Thanks, works like charm, performs query in 0.03 sec - and thanks for pointing out logical operator preference, I have entirely missed it — Zbynek, Dec 26 '17 at 20:20

MySQL: Optimize greatest row in group query

1 Answers1