1

I need to select the maximum amounts in one column for a common id in another column. There could be several id's in the report_id column that have the same, maximum last_update amounts.

Data structure:

+------+-------+--------------------------------------------+
| id   | report_id   | last_update                          |
+------+-------------+--------------------------------------+
| 1    | 1           | 2019-01-24                           |
| 2    | 1           | 2019-01-24                           |
| 3    | 1           | 2019-01-24                           |
| 4    | 2           | 2019-01-24                           |
| 5    | 3           | 2019-01-23                           |
+------+-------+--------------------------------------------+

The problem I am having so far is I can't seem to isolate my results simply by the report_id. For example, with the following query:

"SELECT report_id, last_update
 FROM reports
 WHERE last_update=(
               SELECT MAX(last_update) FROM reports 
               WHERE report_id='1'
               );  
            ";

This returns:

+------+-------+--------------------------------------------+
| id   | report_id   | last_update                          |
+------+-------------+--------------------------------------+
| 1    | 1           | 2019-01-24                           |
| 2    | 1           | 2019-01-24                           |
| 3    | 1           | 2019-01-24                           |
| 4    | 2           | 2019-01-24                           |
+------+-------+--------------------------------------------+ 

So it is nearly correct, but it also is including report_id 2 because it also has the MAX value of 2019-01-24 in last_update.

What I really need to do is select all columns with report_id as 1, and then select only the rows from that result set with MAX(last_update) but I have been looking at every greatest-nth-per-group and associated question on SO and I just can't get this one.

Anytime I bring MAX into the query it seems to negate the fact I am trying to isolate by report_id as well.

absentx
  • 1,397
  • 4
  • 17
  • 31

1 Answers1

1

Here are a few solutions:

Tuple comparison:

SELECT report_id, last_update
 FROM reports
 WHERE (report_id, last_update) = (
               SELECT report_id, MAX(last_update) FROM reports 
               WHERE report_id='1'
               GROUP BY report_id
               );

Tuple comparison with a derived table instead of a dependent subquery:

SELECT report_id, last_update
 FROM reports
 INNER JOIN (
   SELECT report_id, MAX(last_update) AS last_update
   FROM reports WHERE report_id='1' GROUP BY report_id
 ) USING (report_id, last_update);

No-subquery solution, using exclusion join to find the reports for which no other report has the same report_id and a greater update date:

SELECT r1.*
FROM reports AS r1
LEFT OUTER JOIN reports AS r2
  ON r1.report_id=r2.report_id AND r1.last_update<r2.last_update
WHERE r2.report_id IS NULL;

MySQL 8.0 solution with windowing functions:

WITH ranked_reports AS (
  SELECT r.*, DENSE_RANK() OVER (PARTITION BY report_id ORDER BY last_update DESC) AS dr
  FROM reports WHERE report_id='1'
)
SELECT * FROM ranked_reports WHERE dr=1;
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Extremely thorough and all working, thanks. Would the tuple comparison with a derived table be the best option in terms of speed rather than the sub query options? – absentx Jan 24 '19 at 22:34
  • 1
    Depends on your version of MySQL (index optimization for tuple comparisons was implemented MySQL 5.7: https://dev.mysql.com/doc/refman/5.7/en/row-constructor-optimization.html). And probably depends on your data. I suggest you test the different solutions on your system. – Bill Karwin Jan 24 '19 at 23:04