1

Is the function MAX_BY() deterministic. If I use MAX_() for two different columns both depending on a third one, will I get the same row result?

The presto documentation doesn't mention this. This documentation about mysql mention that it is not, so I'm not sure where to find this info.

I tested quickly with the following:

WITH my_table(id, arr, something) AS (
    VALUES
        (1, ARRAY['one'], 0.0),
        (2, ARRAY['two'], 0.0),
        (3, ARRAY['three'], 0.0),
        (4, ARRAY['four'], 0.0),
        (5, ARRAY['five'], 0.0),
        (6, ARRAY[''], 0.0)
)
SELECT
    MAX_BY(id,something),
    MAX_BY(arr,something)
FROM my_table

It returned the first row, so it doesn't feel arbitrary but also does not prove things.

Anyone out there able to help?

There is a related question to return multiple columns from a single MAX_BY() so I'm thinking that I need to use that solution to guarantee the attribute of the same row is selected: max_by with multiple return columns

user1527152
  • 946
  • 1
  • 13
  • 37
  • No, in the case of ties, the result is arbitrary. It may appear to be deterministic, but that's not defined behavior and may change at some point. – Martin Traverso Nov 06 '20 at 19:01

2 Answers2

3

No, in the case of ties, the result of max_by and min_by is arbitrary. It may appear to be deterministic, but that's not defined behavior and may change at some point.

If you want all the values to be consistent, you have to use the trick you referred to, where you pack all the columns of interest in a single value of type ROW:

SELECT max_by((x1, x2, x3), y) r
FROM (...) t(y, x1, x2, x3)
Martin Traverso
  • 4,731
  • 15
  • 24
1

It is probably safer, and more efficient as well, to use window functions:

select *
from (
    select t.*, row_number() over(order by something desc) rn
    from my_table t
) t
where rn = 1

For this simple case, a row-limiting clause is actually good enough:

select *
from my_table 
order by something desc
limit 1

Both query guarantee that the returned values all belong to the same row.

None, however, is deterministic, in the sense that consecutive executions of the same query might return a different row. If you want a stable result, then you need a column (or a set of columns) that can be used to uniquely identify each row: adding id to the order by clause would be just fine here.

GMB
  • 216,147
  • 25
  • 84
  • 135