1

Really struggling with a query that uses groupwise maximum, any help would be much appreciated. Feel free to point out if I should not be using groupwise maximum.

I have two tables application and email, one application can have many emails. What I'm trying to do in my query is get all details from application and join the email table (I'm actually only getting a foreign key from email for another table which indicates if the email has been replied to), getting the last email sent based on the max(timestamp), which is why I am trying to use groupwise maximum.

I've tried this, but it seems to make a duplicate of each row:

SELECT  `application` . * ,  `email1`.`student_email_id` AS  `email_student_email_id` 
FROM  `application` 
LEFT JOIN (
  SELECT MAX( tstamp ) AS tstamp, id, student_email_id, application_id
  FROM email
  GROUP BY id, student_email_id, application_id
) AS email1 ON  `email1`.`application_id` =  `application`.`id` 
WHERE  `application`.`status` =  'returned'

This is what seemed to work at first but is causing issues now and I'm sure it's pretty sloppy code:

select `application`.*, `email1`.`student_email_id` as `email_student_email_id`
from `application` 
left join (
  select student_email_id, max(tstamp) as tstamp, application_id
  from email 
  group by application_id, tstamp
  order by tstamp desc
  limit 1) as email1 on `email1`.`application_id` = `application`.`id` 
where `application`.`status` = 'returned'

Any guidance would be highly appreciated, if you need to see more code please ask! Thanks.

Further clarity if needed for my db set up and what should be happening (left out unimportant parts):

Application Table
+----+----------+
| id |  status  |
+----+----------+
|  1 | returned |
+----+----------+

Email Table
+----+------------+----------------+------------------+
| id |   tstamp   | application_id | student_email_id |
+----+------------+----------------+------------------+
|  1 | 2014-12-26 |              1 | NULL             |
|  2 | 2014-12-27 |              1 | 3                |
+----+------------+----------------+------------------+

The query should be showing the following:

+----+----------+------------------------+
| id |  status  | email_student_email_id |
+----+----------+------------------------+
|  1 | returned |                      3 |
+----+----------+------------------------+

First solution above shows duplicates of everything (maybe I'm nearly there) and second one shows null for the joined table columns, although I'm sure it did work at one stage or in isolation at least!

haakym
  • 12,050
  • 12
  • 70
  • 98

1 Answers1

4

You're looking for the latest row in your Email table for each distinct application_id.

Your subquery to get that isn't quite right. Here's how you get that.

SELECT s.application_id, e.student_email_id
  FROM email e
  JOIN (
         SELECT MAX(tstamp) tstamp, application_id
           FROM email
          GROUP BY application_id
       ) s ON e.application_id = s.application_id AND e.tstamp = s.tstamp

There's another way to do this, that might be more efficient. It will work if the id column is an autoincrement column.

SELECT s.application_id, e.student_email_id
  FROM email e
  JOIN (
         SELECT MAX(id) id
           FROM email
          GROUP BY application_id
       ) s ON e.id = s.id

Either of these preceding subqueries gets the latest student_email_id for each application_id. The second one uses the JOIN to extract only the highest id number for each application_id, and uses that id to find the latest student_email_id.

Your subquery was this. It doesn't get what you hoped for.

 SELECT MAX( tstamp ) AS tstamp, id, student_email_id, application_id /*wrong*/
   FROM email
  GROUP BY id, student_email_id, application_id 

You grouped this by id. That means you're going to get all the detail rows. That's not what you want. Even this

 SELECT MAX( tstamp ) AS tstamp, student_email_id, application_id  /*wrong*/
   FROM email
  GROUP BY student_email_id, application_id 

will give you more than one record for each application_id value.

So the query you need is:

SELECT  application.* ,  email1.student_email_id AS  email_student_email_id 
  FROM  application 
  LEFT JOIN (
              SELECT s.application_id, e.student_email_id
                FROM email e  
                JOIN (
                       SELECT MAX(id) id
                         FROM email
                        GROUP BY application_id
                     ) s ON e.id = s.id
           ) AS email1 ON  email1.application_id =  application.id 
 WHERE application.status =  'returned'

When you're designing queries like this, it's smart to test from the inside out, starting with the innermost subquery.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • thank you very much indeed this has been an extremely huge help. Also I'd like to further thank you for the detailed explanation you provided and the alternative answer which utilised the auto-incrementing primary key. – haakym Dec 28 '14 at 16:22
  • 1
    Thanks for the edits, @haakym. I corrected my answer. – O. Jones Dec 28 '14 at 16:29