1

Given the following database table, which records events (status) for different objects (id) with its timestamp:

ID | Date       | Time | Status
-------------------------------
 7 | 2016-10-10 | 8:23 | Passed
 7 | 2016-10-10 | 8:29 | Failed
 7 | 2016-10-13 | 5:23 | Passed
 8 | 2016-10-09 | 5:43 | Passed

I want to get a result table using plain SQL (MS SQL) like this:

ID | Date       | Status
------------------------
 7 | 2016-10-10 | Failed
 7 | 2016-10-13 | Passed
 8 | 2016-10-09 | Passed

where the "status" is the latest entry on a day, given that at least one event for this object has been recorded.

My current solution is using "Outer Apply" and "TOP(1)" like this:

SELECT DISTINCT rn.id,
                tmp.date,
                tmp.status

FROM run rn OUTER apply
  (SELECT rn2.date, tmp2.status AS 'status'
   FROM run rn2 OUTER apply
     (SELECT top(1) rn3.id, rn3.date, rn3.time, rn3.status
      FROM run rn3
      WHERE rn3.id = rn.id
        AND rn3.date = rn2.date
      ORDER BY rn3.id ASC, rn3.date + rn3.time DESC) tmp2
   WHERE tmp2.status <> '' ) tmp

As far as I understand this outer apply command works like:

For every id
  For every recorded day for this id
     Select the newest status for this day and this id

But I'm facing performance issues, therefore I think that this solution is not adequate. Any suggestions how to solve this problem or how to optimize the sql?

rcvd
  • 51
  • 1
  • 6

3 Answers3

1

Your code seems too complicated. Why not just do this?

SELECT r.id, r.date, r2.status
FROM run r OUTER APPLY
     (SELECT TOP 1 r2.*
      FROM run r2
      WHERE r2.id = r.id AND r2.date = r.date AND r2.status <> ''
      ORDER BY r2.time DESC
     ) r2;

For performance, I would suggest an index on run(id, date, status, time).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Great, this little change - which I was too blind to see - made the code running 8 times faster. Thanks a lot! – rcvd Dec 02 '16 at 12:38
0

Using a CTE will probably be the fastest:

with cte as 
(
    select ID, Date, Status, row_number() over (partition by ID, Date order by Time desc) rn
    from run
)
select ID, Date, Status 
from cte
where rn = 1
James Casey
  • 2,447
  • 1
  • 11
  • 19
-1

Do not SELECT from a log table, instead, write a trigger that updates a latest_run table like:

CREATE TRIGGER tr_run_insert ON run FOR INSERT AS 
BEGIN
    UPDATE latest_run SET Status=INSERTED.Status WHERE ID=INSERTED.ID AND Date=INSERTED.Date
    IF @@ROWCOUNT = 0
        INSERT INTO latest_run (ID,Date,Status) SELECT (ID,Date,Status) FROM INSERTED
END

Then perform reads from the much shorter lastest_run table. This will add a performance penalty on writes because you'll need two writes instead of one. But will give you much more stable response times on read. And if you do not need to SELECT from "run" table you can avoid indexing it, therefore the performance penalty of two writes is partly compensated by less indexes maintenance.

Serg M Ten
  • 5,568
  • 4
  • 25
  • 48
  • 1
    Thank you for this suggestion, but it's a commercial application which I'm not allowed to change. – rcvd Dec 02 '16 at 12:28