1

I have a simple SQLITE db that looks a bit like this

ID      TID      LASTUPDATE     UPDATE
============================================
1       213      2020-09-09         ok
2       416      2019-12-25         ok
3       213      2020-11-10         meh
...
999999  899      2020-12-11         bad

There are around a thousand DISTINCT TIDs and hundreds of thousands of updates for each TID.

I would like to get the last UPDATE for each distinct TID irrespective of when it was made and I'd like to do it in one go. I don't want to get all the distinct TIDs then for each TID get its UPDATE and LASTUPDATE

Something like "SELECT * FROM updates GROUP BY TID" won't work because the GROUP BY will give the first TID it finds and I want the last one it did, so I don't want this:

1       213      2020-09-09         ok
2       416      2019-12-25         ok
...

When I want (in effect, ordering doesn't matter)

3       213      2020-11-10         meh
2       416      2019-12-25         ok
...

Unfortunately my knowledge of SQL ends with simple CRUD stuff.

Any help appreciated.

(NEXT DAY)

As an update to this - for me anyway - the basic way of getting all the distinct tids then foreach tid get its last update was faster (.5s) when the number rows in the table was below 200,000 once it went above that the runtime rocketed into many secs and the 3rd way became much more useful as although it took 2.5s to run it appears that that was a constant runtime that didn't appear to change greatly.

Also I couldn't get the 2nd version to work, but with some editing

select t.*
from updates t join
     (select id, tid, max(t.lastupdate) as max_lastupdate from updates t group by vid) tt
      on t.id = tt.id and (t.lastupdate = tt.max_lastupdate);

I'm finding that its runtime is always around 250ms which easily out performs both versions.

So thanks Gordon Linoff, I appreciate your time on this.

push 22
  • 1,172
  • 3
  • 15
  • 34

1 Answers1

1

One method uses a correlated subquery:

select t.*
from t
where t.lastupdate = (select max(t2.lastupdate) from t t2 where t2.tid = t.tid);

For performance, you want an index on (tid, lastupdate).

You can also try phrasing this as:

select t.*
from t join
     (select tid, max(t2.lastupdate) as max_lastupdate
      from t
      group by tid
     ) tt
     on t.id = tt.id and t.lastupdate = tt.max_lastupdate;

Or:

select t.*
from (select t.*, 
             row_number() over (partition by tid order by lastupdate desc) as seqnum
      from t
     ) t
where seqnum = 1;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Not sure if this works or not, the query has been running for 40minutes already! might need to add some more indexes!!! – push 22 Nov 10 '20 at 15:36
  • ok that works, but even with optimisations still takes over 5 mins to bring back the results, which for my needs - GUI display updates - won't work. If I do it the long way, get a list of distinct tids then get the max lastupdate foreach tid that is still slow (.5 sec) but good enough - thanks you taught me something. – push 22 Nov 10 '20 at 16:36
  • @push22 . . . I added two other options. I suggested the correlated subquery because I thought it would be fastest. If you care to try the others, I would be interested in how they perform. – Gordon Linoff Nov 10 '20 at 16:40
  • 1
    thanks the last one bought the run time down to 2.5 secs! much quicker but its still quicker the long way - but hey, it is what it is! – push 22 Nov 10 '20 at 16:49