0

I'm looking for a perfomant way to write a SQL query.

I've got a table with columns (id,fname,lname,accountid,creation_date,update_date) and i have to look for the records in that table that have the same fname,lname,accountid and have the most recent date based on greatest(max(creation_date),max(update_date)) (note that update_date can be null)

I expect that I need to use analytic functions

I have these cases:

(id,fname,lname,accountid,creation_date,update_date)
(1,'a','a','2','07/01/2010 10:59:43','07/01/2010 10:59:43')
(2,'a','a','2','07/01/2010 10:59:43','07/01/2010 10:59:43')
(3,'a','a','2','07/01/2010 10:59:43','07/01/2010 10:59:43')

I want to choose the last inserted : this record (3,'a','a','2','07/01/2010 10:59:43','07/01/2010 10:59:43')

(id,fname,lname,accountid,creation_date,update_date)
(3,'b','a','2','07/01/2009 10:59:43','07/01/2010 10:59:43')
(4,'b','a','2','07/01/2011 10:59:43',null)
(5,'b','a','2','07/01/2009 10:59:43','07/01/2009 10:59:43')

I want to choose the most recent one on both columns (creation_date,update_date) which is (4,'b','a','2','07/01/201110:59:43',null)

(id,fname,lname,accountid,creation_date,update_date)
(6,'c','g','4','07/01/2010 10:59:43',null)
(7,'c','g','4','07/01/2011 10:59:43',null)
(8,'c','g','4','07/01/2009 10:59:43',null)

I want to choose the most recent one on both columns (creation_date,update_date) which is (7,'c','g','4','07/01/2011 10:59:43',null)

(id,fname,lname,accountid,creation_date,update_date)
(9,'k','t','2','07/01/2009 10:59:43','07/01/2012 10:59:43')
(10,'k','t','2','07/01/2011 10:59:43',null)
(11,'k','t','2','07/01/2009 10:59:43','07/01/2009 10:59:43')

I want to choose the most recent one on both columns (creation_date,update_date) which is (9,'k','t','2','07/01/2009 10:59:43','07/01/2012 10:59:43')

Justin Cave
  • 227,342
  • 24
  • 367
  • 384
R Vive L OL
  • 177
  • 3
  • 3
  • 10
  • I'm not sure that I understand your question. It would be helpful to post some sample data, the desired output, and an explanation of how you got from the sample data to the desired output. It would be even better if you could post that in the form of the `DDL` to create the table and the `DML` to insert the sample data so we can run your code in our environment(s). – Justin Cave May 10 '12 at 21:33
  • In the first set of rows, why do you choose the row with an `id` of 3 when all 6 date values appear to be identical? – Justin Cave May 11 '12 at 20:48

2 Answers2

2

You should use the analytic functions rank() or row_number(). My own particular preference is toward rank() but it only really works well if are partitioning by a unique index. Something like the following, assuming there is a unique index on fname, lname, accountid, creation_date

select *
  from ( select a.*
              , rank() over ( partition by fname, lname, accountid 
                      order by creation_date desc
                              , update_date desc ) as rnk
           from my_table a )
 where rnk = 1

This orders each combination of fname, lname, accountid by creation_date, update_date. Using where rnk = 1, enables you to then select the maximum creation_date, update_date.

Ben
  • 51,770
  • 36
  • 127
  • 149
  • I've been struggling with something for a while now and your rank suggestion worked perfectly! – Sun Apr 05 '14 at 00:01
0

It sounds like you want something like

SELECT *
  FROM (SELECT a.*,
               rank() over (partition by fname, lname, accountid
                                order by coalesce( update_date, creation_date ) desc,
                                         id desc) rnk
          FROM your_table)
 WHERE rnk = 1
  • I'm guessing a bit at why you would want the row with an id of 3 returned rather than the rows with an id of 1 or 2 since all 3 rows have the same update_date and creation_date. My guess is that you want to pick the tie by choosing the row with the largest id value. If you want to implement a different rule, you'll need to tell us what rule that is.
  • I am also assuming that the update_date, if it exists, will always be greater than or equal to the creation_date. It would be odd if a row were updated before it was created.
Justin Cave
  • 227,342
  • 24
  • 367
  • 384