Efficiently get IDs with a given status from table of status updates

Question

I have a SQLite table that records a history of status updates with columns TaskID, Timestamp, and Status. Each time the status of a task is updated, I add a row to the history table describing the new status. The current status of a task is defined as the Status value of the row with the largest (latest) Timestamp of all the rows matching the given TaskID. I'm running into problems when the table reaches has ~25,000 rows.

I need a little help efficiently retrieving a single task with status "not started". Conceptually, the table looks like this:

TaskID   Status          Timestamp
------   -------------   ---------
1        "not started"   1
2        "not started"   2
2        "started"       3
1        "started"       4
1        "error"         5
1        "not started"   6

And I'd like to identify Task 1 as a task with current status "not started".

I have tried

SELECT TaskID, h.Status
     FROM history AS h
     WHERE Timestamp = (SELECT MAX(Timestamp)
         FROM history as h2
         WHERE h2.TaskID = h.TaskID)
         AND h.Status = "not started"
     LIMIT 1

which does retrieve a correct row, but takes ~5 seconds. Ideally I'd like to get the query below 1 second.

I have been reading about GROUP BY ... HAVING as a substitute for the compound SELECT (my guess at the bottleneck) but can't quite figure out how to first select the most recent row per task and then filter out rows that don't match my condition on Status.

So, is there a better way to specify this query? Would adding an index help? Is there a better structure for this table? Or have I reached the point where I need to upgrade beyond SQLite to get the performance I need?

score 2 · Accepted Answer · answered Dec 30 '19 at 21:16

Try with ROW_NUMBER() window function:

select t.TaskID, t.Status, t.Timestamp
from (
  select *,
    row_number() over (partition by TaskID order by Timestamp desc) rn
  from history  
) t  
where t.rn = 1 and t.Status = 'not started'

See the demo.
Results:

| TaskID | Status      | Timestamp |
| ------ | ----------- | --------- |
| 1      | not started | 6         |

Efficiently get IDs with a given status from table of status updates

1 Answers1