I have a SQLite table that records a history of status updates with columns TaskID
, Timestamp
, and Status
. Each time the status of a task is updated, I add a row to the history
table describing the new status. The current status of a task is defined as the Status value of the row with the largest (latest) Timestamp of all the rows matching the given TaskID. I'm running into problems when the table reaches has ~25,000 rows.
I need a little help efficiently retrieving a single task with status "not started". Conceptually, the table looks like this:
TaskID Status Timestamp
------ ------------- ---------
1 "not started" 1
2 "not started" 2
2 "started" 3
1 "started" 4
1 "error" 5
1 "not started" 6
And I'd like to identify Task 1 as a task with current status "not started".
I have tried
SELECT TaskID, h.Status
FROM history AS h
WHERE Timestamp = (SELECT MAX(Timestamp)
FROM history as h2
WHERE h2.TaskID = h.TaskID)
AND h.Status = "not started"
LIMIT 1
which does retrieve a correct row, but takes ~5 seconds. Ideally I'd like to get the query below 1 second.
I have been reading about GROUP BY ... HAVING
as a substitute for the compound SELECT
(my guess at the bottleneck) but can't quite figure out how to first select the most recent row per task and then filter out rows that don't match my condition on Status
.
So, is there a better way to specify this query? Would adding an index help? Is there a better structure for this table? Or have I reached the point where I need to upgrade beyond SQLite to get the performance I need?
Related: