0

I have a table with only 3 columns: id, name and timestamp , and it is required to extract for each id (and therefore for each name) the most recent timestamp.

For example having this table:

id name timestamp
1 foo 2022-02-02
1 foo 2022-03-02
2 bar 2022-01-01
2 bar 2022-04-01

It is expected to get as result:

1,foo,2022-03-02
2,bar,2022-04-01

One implementation could be:

select id, name, timestamp
from
(select 
    *,
    row_NUMBER() over(partition by id order by timestamp desc ) ranking
from sample) a
where a.ranking=1

Another possible solution would be this:

select *
from
(select id,max(timestamp) max_timestamp
from sample
group by id) foo,
sample s
where foo.id=a.id
and a.timestamp=foo.max_timestamp

but I think is less efficient.

How can I formally compute the space complexity of the first solution and compare it with the second one?

The Impaler
  • 45,731
  • 9
  • 39
  • 76
  • 1
    Which dbms are you using? – jarlh Jun 12 '22 at 15:56
  • 1
    You can see source code for your dbms, compare it with an execution plan, and finally, calculate how much memory it takes. – user14063792468 Jun 12 '22 at 15:58
  • 1
    The dbms I am using is postgreSQL – decoyoctopus1489 Jun 12 '22 at 16:00
  • `select id, name, max(timestamp) from T group by id, name` – shawnt00 Jun 12 '22 at 17:42
  • As @Kadet indicates an EXPLAIN/ EXPLAIN ANALYZE will show you what the optimizer thinks is best. But the best determination is to actually run them - especially since you have already written them - and compare the results. Include other versions (i.e, Kadet's answer). Run each several times, in different order, with different amounts of data, different server workloads, etc. Record then analyze them. Choose the best performer. The only way to truly know: Test, Test, Test. – Belayer Jun 12 '22 at 18:51
  • Also, those two queries may not give the same result. If two row exists with the same id and the same timestamp, the first query selects 1 (arbitrarily), the second will select both. – HoneyBadger Jun 12 '22 at 20:40

2 Answers2

2

You can easily use group by and max functions to get the last timestamp for each id.

select id
      ,name
      ,max(timestamp)
from t
group by id, name 
id name max
1 foo 2022-03-02 00:00:00
2 bar 2022-04-01 00:00:00

Fiddle

DannySlor
  • 4,574
  • 1
  • 3
  • 11
1

If we want to take last timestamp from each id, and name must be from this last record, then:

select 
  distinct on (id) id,
  name,
  timestamp 
from sample 
order by id,timestamp desc

Complexity can be checked by using EXPLAIN keyword before query. EXPLAIN show estimates, EXPLAIN ANALYZE additionally run query and shows real time consumed by query. Efficiency depends on table definition and indexes.

Kadet
  • 1,344
  • 3
  • 10