Join one table with the most recent rows of another

Question

I have 2 MySQL tables consisting of the following information:

table1 (basic information)

name | url
a    | www.a.com
b    | www.b.com
c    | www.c.com

table2 (time series data)

name | status | date
a    | ok     | 22/12/14
b    | ok     | 22/12/14
c    | ok     | 22/12/14
a    | ok     | 21/12/14
b    | ok     | 21/12/14
c    | ok     | 21/12/14
etc

I need to do a join so I have all the entries from table1 joined with the most recent entries of table2. So the output would look like:

output

name | url       | status | date
a    | www.a.com | ok     | 22/12/14
b    | www.b.com | ok     | 22/12/14
c    | www.c.com | ok     | 22/12/14

What query would give the output above?

possible duplicate of [How do I join the most recent row in one table to another table?](http://stackoverflow.com/questions/497535/how-do-i-join-the-most-recent-row-in-one-table-to-another-table) — abl, Dec 23 '14 at 21:02

score 2 · Accepted Answer · answered Dec 23 '14 at 20:19

That's a tricky one. What you can do is join the second table twice - one to find the "newest" lines and the second time to get the actual data.

SELECT t1.name, t1.url, t2.status, t2.date
FROM table1 t1
LEFT JOIN (SELECT name, max(date) as mx from table2 GROUP BY name) as X ON X.name = t1.name
LEFT JOIN table2 t2 0N t2.name = X.name AND t2.date = X.mx

I used name for joining. You'd normally use some keys (ids)

score 1 · Answer 2 · answered Dec 27 '14 at 01:11

I specialize in such time-sensitive designs and here is what I do. Your second table is a Versioned table in that, like source control systems, when the data is changed, the old data remains, just a new copy is made with the date the change was made. A small change can add full bi-temporal functionality, but that's not your question, is it? 8)

If, like I have found to be true, you notice that the overwhelming majority of the queries against this table are for current data, then one thing you may want to consider is creating a view to expose only the current version of each row.

create view tab2 as
select *
from   table2 t2
where  date =(
    select  max( date )
    from    table2
    where   name = t2.name );

Then you can simply join the first table with the view for a one-to-one correlation with the data in table1 with only the current data in table2. This allows you to abstract away the time-sensitive nature of the data.

If there are reasons you can't use a view (such as an old-school DBA who has seizures at the thought of joining with a view) then you have to write the whole thing as one query. Fortunately, that's not difficult, but abstraction is handy.

select t1.Name, t1.URL, t2.Status, t2.Date
from   table1 t1
join   table2 t2
  on   t2.Name = t1.Name
  and  t2.Date =(
       select  max( Date )
       from    table2
       where   name = t2.name );

Some DBMSs do not allow a subquery in the join. In that case, just move it to the WHERE clause:

select t1.Name, t1.URL, t2.Status, t2.Date
from   table1 t1
join   table2 t2
  on   t2.Name = t1.Name
where  t2.Date =(
       select  max( Date )
       from    table2
       where   name = t2.name );

If Name and Date form a unique index (either defined explicitly or because they form the PK of the table), you will find performance to be much better than you might at first think. Try it and compare with alternatives.

Another nice thing about using a view is that you can write a "before" or "instead of" trigger to convert Update statements into the proper Insert to the table of a new version, thus maintaining the abstraction. — TommCatt, Dec 27 '14 at 22:06

score 0 · Answer 3 · answered Dec 23 '14 at 20:13

0

I usually work in SQLServer so you'll have to correct the syntax if applicable, but basically you're going to group by the max date available and only return those rows. See the code below and give it a try! Let me know if this helps.

SELECT t1.name, t1.url, t2.status, max(t2.date)
FROM table1 t1
INNER JOIN table2 t2 ON t1.name = t2.name
GROUP BY t1.name, t1.url, t2.status, max(t2.date)

answered Dec 23 '14 at 20:13

redxdotrocks

34
2

Looks promising but I get an 'invalid use of the group function' error – Eduardo Dec 23 '14 at 20:23
Sorry about that, it's likely a syntax issue. Looks like you're up to your ears in answers, please give one of theirs a try. I'd be interested to know which one worked for you. – redxdotrocks Dec 23 '14 at 20:24

HaveNoDisplayName · Answer 4 · 2014-12-23T20:56:14.813

0

Try this, Group By with dates

SELECT  a.*,b.*
FROM    table1 a
        INNER JOIN
        (
            SELECT  table2.name name2, MAX(Date) max_date, status
            FROM    table2
            GROUP   BY name, status

        ) b ON  a.name = b.name2

edited Dec 23 '14 at 20:56

answered Dec 23 '14 at 20:15

HaveNoDisplayName

8,291
106
37
47

score 0 · Answer 5 · answered Dec 23 '14 at 20:19

0

    Select t1.*, t2.status,t2.date 
    from table1 t1 inner join table t2 
    on t1.name = t2.name 
    where t2.date=(select max(date) from table2)

answered Dec 23 '14 at 20:19

varun

1,473
1
9
15

Cristian Scutaru · Answer 6 · 2014-12-23T23:28:48.983

Query with no grouping or aggregates:

SELECT t1.name, t1.url, t21.status, t21.date
FROM table1 t1
INNER JOIN table2 t21 ON t1.name = t21.name
LEFT JOIN table2 t22 ON t21.name = t22.name AND t21.date < t22.date
WHERE t22.name IS NULL;

Another new alternative:

SELECT t1.name, t1.url, t2.status, t2.date
FROM table1 t1 INNER JOIN table2 t2 ON t1.name = t2.name
WHERE t2.date = (SELECT max(date) FROM table2 t22 WHERE t22.name = t2.name);

MySQL does not support (yet?) the popular window functions, added to most other RDBMSs today (Oracle, SQL Server, PostgreSQL). This is how natural such a query would be written with ROW_NUMBER:

SELECT name, url, status, date from (
 SELECT t1.name, t1.url, t2.status, t2.date,
 ROW_NUMBER() OVER (PARTITION BY t1.name ORDER BY t2.date DESC) rn
 FROM table1 t1 INNER JOIN table2 t2 ON t1.name = t2.name
) tmp WHERE rn = 1;

score -1 · Answer 7 · answered Dec 23 '14 at 20:19

-1

SELECT t1.name, t1.url, t2.status, t2.date 
   FROM 
     table1 t1 
     JOIN table2 latest ON latest.name = t1.name 
     JOIN table2 t2 ON t2.name = latest.name AND t2.date = MAX(latest.date)
   GROUP BY t1.name, t1.url

What this does is joins table2 to itself, in order to find the latest date for the given name.

answered Dec 23 '14 at 20:19

Daniel

4,481
14
34

An aggregate cannot appear in an ON clause unless it is in a subquery contained in a HAVING clause or select list? – HaveNoDisplayName Dec 23 '14 at 20:41

Join one table with the most recent rows of another

7 Answers7