1

I have a table with data that updates daily in the following format:

id | aa | bb | update_day

That means the ids repeat every time it is updated, making it a "history" table (records every change on the id and when it was made).

My goal is to get the data for an entire row when aa's value is maximum for each different id.

Example:

id  aa  bb  update_day
1   12  Foo 2019-08-02
1   13  Foo 2019-08-03
1   14  Bar 2019-08-04
2   25  Baz 2019-08-02
2   26  Baz 2019-08-03
2   27  Baz 2019-08-04

In this case, since there are two different ids, I need 2 rows as a result, which are Row 3 and Row 6, since column aa's value is maximum on those rows for each of these ids.

If I do the following query it returns the rows I want. However it doesn't include all the data I need (column bb).

    SELECT id, MAX(aa) FROM table GROUP BY id

This returns

    1, 14
    2, 27

However, if I try to also get the rest of the data, since bb and update_day change, it returns more than 2 rows:

    SELECT id, MAX(aa), bb FROM table GROUP BY id, bb

This returns

    1, 13, Foo
    1, 14, Bar
    2, 27, Baz

And I only need the last two rows.

I tried removing bb from the GROUP BY clause but it returns the error ORA-00979: not a GROUP BY expression

forpas
  • 160,666
  • 10
  • 38
  • 76

3 Answers3

1

With NOT EXISTS:

select t.id, t.aa, t.bb 
from tablename t
where not exists (
  select 1 from tablename
  where id = t.id and aa > t.aa
)

If there is a case for 2 rows of the same id containing the maximum aa, this code will return both rows.
If you don't want 2 rows:

select t.id, t.aa, max(t.bb) bb from tablename t
where not exists (
  select 1 from tablename
  where id = t.id and aa > t.aa
)
group by t.id, t.aa
forpas
  • 160,666
  • 10
  • 38
  • 76
1
select *
from table
where (id, aa) in 
(
    SELECT id, MAX(aa) FROM table GROUP BY id
)
ramana_k
  • 1,933
  • 2
  • 10
  • 14
1

You can consider using row_number() :

with t as
(
 select t0.*,
        row_number() over (partition by id order by aa desc) as rn
   from t0
)   
select id, aa, bb, update_day
  from t
 where rn = 1;

ID  AA  BB    UPDATE_DAY
--  --  ----  -----------
1   14  Bar   2019-08-04
2   27  Baz   2019-08-04

or max(aa) over (...) analytic function :

with t as
(
 select t0.*,
        max(aa) over ( partition by id ) as mx
   from t0
)   
select id, aa, bb, update_day
  from t
 where aa = mx;

ID  AA  BB    UPDATE_DAY
--  --  ----  -----------
1   14  Bar   2019-08-04
2   27  Baz   2019-08-04

or dense_rank() analytic function ( perhaps would be the best for your case if you have ties for aa and update_day values. I mean if there exist cases where aa and update_day repeat for the same id values ) :

with t as
(
 select t0.*,
        max(aa) over ( partition by id ) as mx
   from t0
)   
select id, aa, bb, update_day
  from t
 where aa = mx;

ID  AA  BB    UPDATE_DAY
--  --  ----  -----------
1   14  Foo   2019-08-04 -- provided this is inserted instead of your original one.
1   14  Baz   2019-08-04
2   27  Baz   2019-08-04

Demo

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
  • 1
    The first option using row_number() worked perfectly. It was quite efficient as well, my dataset has ~4,5m rows and it took faster than expected. Thank you! – Leonardo Freitas Aug 07 '19 at 12:07