Understanding correlation in mysql

Question

I have a table with duplicate IDs representing a person who has placed an order. Each of these orders has a date. Each order has a status code from 1 - 4. 4 means a cancelled order. I am using the following query:

SELECT
    personID, MAX(date), status
FROM
    orders
WHERE
    status = 4
GROUP BY
    personID

The problem is, while this DOES return a unique record for each person with their most recent order date, it does NOT give me the correct status. In other words, I assumed that the status would be correctly correlated to the MAX(date) and it is not. It simply pulls, seemingly at random, one of the statuses from one of the orders. Can I add specificity to say, in basic terms, give me the EXACT status from the same record as whatever the MAX(date) is.

try `GROUP BY personID, status` – Tin Tran Apr 19 '16 at 21:10 — Tin Tran, Apr 19 '16 at 21:10

score 1 · Accepted Answer · answered Apr 19 '16 at 21:13

Unfortunately, there is no simple way to get what you want. Most other RDBMS vendors don't even consider queries using aggregate functions valid unless all non-aggregated result fields are in the GROUP BY. The general solution for these kinds of questions usually involves a subquery to get the "last" records, which is then joined to the original table to get those rows.

Depending on the structure of your data this may or may not be possible. For instance, if you have multiple rows with the same personID and date there is no way to determine from those alone which one's status should be used.

score 0 · Answer 2 · edited May 23 '17 at 10:28

To get result you want you could use:

SELECT personId, date, status
FROM orders
WHERE (personID,date) IN (SELECT personID, MAX(date)
                          FROM orders
                          -- WHERE status = 4
                          GROUP BY personID);

As for:

It simply pulls, seemingly at random, one of the statuses from one of the orders.

It works as intended:

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate

Related: Group by clause in mySQL and postgreSQL, why the error in postgreSQL?

Understanding correlation in mysql

2 Answers2