2

I am writing a custom Presto Aggregation Function that produces the correct result if (and only if) the values are ordered in ascending order by the value that I am aggregating on. i.e.

The following will work:

SELECT key, MY_AGG_FUNC(value ORDER BY value ASC) FROM my_table GROUP BY key

The following will yield an incorrect result:

SELECT key, MY_AGG_FUNC(value) FROM my_table GROUP BY key

When developing the MY_AGG_FUNC, is there a way to enforce ORDER BY value ASC internally without relying on the caller to add it to the query?

As an alternative, is there a way to throw an Exception if the user does not specify the ORDER BY at all (or an incorrect ordering)?

Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82
Marsellus Wallace
  • 17,991
  • 25
  • 90
  • 154
  • This looks consistent with the behavior of other built-in aggregate functions, and of SQL in general: if the caller wants an ordering, he should be explicit about it. – GMB Dec 20 '20 at 00:11
  • Thanks! Could you point me to a built-in aggregate function that enforces that at runtime? – Marsellus Wallace Dec 20 '20 at 00:36
  • I meant that *not* implementing a default sort strategy is consistent with what built-ins do. Some of them accept an `ORDER BY` clause (`arry_agg()`, `string_agg()`, ...) - and if the caller does not specify it, the ordering of the results is not defined. – GMB Dec 20 '20 at 00:38
  • I am a bit confused about the implementation of the `@CombineFunction` though. Will the group/partition be split any further? If so, one partition might be "in the middle" and not start processing the partition from the smallest value. I am trying to implement a "fold" operation here. – Marsellus Wallace Dec 20 '20 at 00:55

1 Answers1

2

When developing the MY_AGG_FUNC, is there a way to enforce ORDER BY value ASC internally without relying on the caller to add it to the query?

There is no way to do that.

As an alternative, is there a way to throw an Exception if the user does not specify the ORDER BY at all (or an incorrect ordering)?

There is no way to do that, other than checking within the aggregation function implementation that received values are in ascending order.

Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82
  • 1
    So I could just keep the "previous value" in the aggregation function "state" and throw an exception if the "next value" is strictly less than "previous value". That would enforce correctness I guess – Marsellus Wallace Dec 19 '20 at 21:27
  • What's the role of the `@CombineFunction` if I do require the input to be sorted? i.e. the implementation is like a "fold" (it must start computing from the very first row). Is there a chance that internal partitioning will affect the end result? – Marsellus Wallace Dec 21 '20 at 16:42
  • Correct, but you can detect this as well. – Piotr Findeisen Dec 22 '20 at 11:53