0

I have a table that stores the lines on a contract. Each contract line his it's own unique ID, it also has the ID of its parent contract. Example:

+-------------+---------+
| contract_id | line_id |
+-------------+---------+
|        1111 |     100 |
|        1111 |     101 |
|        1111 |     102 |
+-------------+---------+

I have another table that stores the historical changes to contract lines. For example, every time the number of units on a contract line is changed a new row is added to the table. Example:

+-------------+---------+--------------+-------+
| contract_id | line_id | date_changed | units |
+-------------+---------+--------------+-------+
|        1111 |     100 | 2016-01-01   |     1 |
|        1111 |     100 | 2016-02-01   |     2 |
|        1111 |     100 | 2016-03-01   |     3 |
+-------------+---------+--------------+-------+

As you can see the contract line with ID 100 belonging to the contract with ID 1111 has been edited 3 times over 3 months. The current value is 3 units.

I'm running a query against the contract lines table to select all data. I want to join to the historical data table and select the most recent row for each contract line and show the units in my results. How do I do this?

Expected results (there would single results for 101 and 102 as well):

+-------------+---------+-------+
| contract_id | line_id | units |
+-------------+---------+-------+
|        1111 |     100 |     3 |
+-------------+---------+-------+

I've tried the query below with a left join but it returns 3 rows instead of 1.

Query:

SELECT *, T1.units
FROM contract_lines
LEFT JOIN (
    SELECT contract_id, line_id, units, MAX(date_changed) AS maxdate
    FROM contract_history
    GROUP BY contract_id, line_id, units) AS T1
    ON contract_lines.contract_id = T1.contract_id 
    AND contract_lines.line_id = T1.line_id

Actual results:

+-------------+---------+-------+
| contract_id | line_id | units |
+-------------+---------+-------+
|        1111 |     100 |     1 |
|        1111 |     100 |     2 |
|        1111 |     100 |     3 |
+-------------+---------+-------+
Equalsk
  • 7,954
  • 2
  • 41
  • 67

5 Answers5

3

An extra join to contract_history along with maxdate will work

SELECT contract_lines.*,T2.units
FROM contract_lines
LEFT JOIN (
    SELECT contract_id, line_id, MAX(date_changed) AS maxdate
    FROM contract_history
    GROUP BY contract_id, line_id) AS T1 
    JOIN contract_history T2 ON 
         T1.contract_id=T2.contract_id and 
         T1.line_id= T2.line_id and 
         T1.maxdate=T2.date_changed
ON contract_lines.contract_id = T1.contract_id
AND contract_lines.line_id = T1.line_id

Output

enter image description here

nobody
  • 10,892
  • 8
  • 45
  • 63
  • The difference is here there isnt `units` field. you get that from original table – Juan Carlos Oropeza Jul 20 '16 at 16:27
  • This doesn't work for me. I get an error on the last two lines saying "The multi-part identifier 'contract_lines.contract_id' could not be bound" and the same for 'line_id' respectively. This is not an error I'm familiar with. Any ideas? – Equalsk Jul 21 '16 at 08:42
1

This is my preferred style because it doesn't require self joining and cleanly expresses your intent. Also, it competes very well with the ROW_NUMBER() method in terms of performance.

select a.*
     , b.units
from contract_lines as a
join (
    select a.contract_id
         , a.line_id
         , a.units
         , Max(a.date_changed) over(partition by a.contract_id, a.line_id) as max_date_changed
    from contract_history as a
) as b
    on a.contract_id = b.contract_id
   and a.line_id = b.line_id
   and b.date_changed = b.max_date_changed;
Kittoes0124
  • 4,930
  • 3
  • 26
  • 47
0

Another possible solution to this. This uses RANK to sort/filter this. Similar to what you did, just a different tact.

SELECT contract_lines.*, T1.units
FROM contract_lines
LEFT JOIN (
    SELECT contract_id, line_id, units,
    RANK() OVER (PARTITION BY contract_id, line_id ORDER BY date_changed DESC) AS [rank]
    FROM contract_history) AS T1
ON contract_lines.contract_id = T1.contract_id 
AND contract_lines.line_id = T1.line_id
AND T1.rank = 1
WHERE T1.units IS NOT NULL

You could change this to a INNER JOIN and remove the IS NOT NULL in the WHERE clause if you expect data to be present all the time.

Glad you figured it out!

R. Richards
  • 24,603
  • 10
  • 64
  • 64
0

Try this simple query:

SELECT TOP 1 T1.*
FROM contract_lines T0 
    INNER JOIN contract_history T1 
        ON T0.contract_id = T1.contract_id and 
            T0.line_id = T1.line_id 
ORDER BY date_changed DESC
Eduardo Pelais
  • 1,627
  • 15
  • 21
-1

As always seems to be the way after spending an hour looking at it and shouting at StackOverflow for having a rare period of maintenance I solve my own problem not long after posting a question.

In an effort to help anyone else who's stuck I'll show what I found. It might not be an efficient way to achieve this so if someone has a better suggestion I'm all ears.

I adapted the answer from here: T-SQL Subquery Max(Date) and Joins

SELECT *,
       Units = (SELECT TOP 1 units
                FROM contract_history
                WHERE contract_lines.contract_id = contract_history.contract_id
                AND contract_lines.line_id = contract_history.line_id
                ORDER BY date_changed DESC
                )
FROM ....
Community
  • 1
  • 1
Equalsk
  • 7,954
  • 2
  • 41
  • 67