5

I'm trying to use LAG in T-SQL to compute some lagging features. I got a little worried when the LAG reference page says that this function is non-deterministic. The reference page on function determinism says that "specifying an ORDER BY clause in a query does not change the determinism of a function that used in that query". However, I don't see why LAG would return different results under the same condition. If it does, why would people use it? Maybe I'm not interpreting "determinism" correctly? Thanks!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
hlu
  • 347
  • 1
  • 2
  • 12
  • 2
    `LAG provides access to a row at a given physical offset that comes before the current row` Perhaps because the physical ordering can change from one execution of the query to the next, e.g. due to an INSERT or DELETE? – Eric J. Feb 19 '16 at 17:02
  • 1
    If `AVG` is regarded as deterministic then `LAG` should also be as long as the set of columns used in the partition by and order by is unique. Then there is no ambiguity and it will always return the same result for a given input set. – Martin Smith Feb 19 '16 at 17:31
  • Actually no. AVG will not change on index reorganization - LAT will. – TomTom Feb 19 '16 at 17:35
  • 4
    @TomTom - index reorganisation is irrelevant as long as the set of columns used in the partition by and order by is unique. `SUM` and `AVG` can certainly change dependant on order the values they are presented as here http://blogs.msdn.com/b/khen1234/archive/2005/05/13/417153.aspx – Martin Smith Feb 19 '16 at 17:40

2 Answers2

8

According to the MSDN documentation, nondeterministic functions may return different results each time they are called with a specific set of input values even if the database state that they access remains the same, so this is not related to data changes(INSERT, DELETE, UPDATE).

However, Eric is right regarding the physical sorting order. The physical sorting order can vary from one query to the other, for example when there are duplicate rows in the data. In that scenario LAG and LEAD can return different results depending on the chosen execution plan. On the other hand, the AVG function is deterministic, because it will always return the same results for the same data set regardless of sorting order.

Alex
  • 21,273
  • 10
  • 61
  • 73
2

In mathematics and physics, a deterministic system is a system in which no randomness is involved in the development of future states of the system. A deterministic model will thus always produce the same output from a given starting condition or initial state. https://en.wikipedia.org/wiki/Deterministic_system

The LAG function itself is not deterministic because its results can change depending on data state, Eric is correct. In some data models, and when applied correctly, it can be deterministic (like if you order by numeric key in your lag) but the function definition by itself is not deterministic.

Make sense?

TT.
  • 15,774
  • 6
  • 47
  • 88
saarrrr
  • 2,754
  • 1
  • 16
  • 26
  • Makes sense but the same page linked by the OP also claims that "All aggregate functions are deterministic unless they are specified with the OVER and ORDER BY clauses." - the same argument about data state applies to those... – Martin Smith Feb 19 '16 at 17:36
  • OVER and ORDER BY is required for LAG to work. Plus, LAG is a window function, it does not aggregate. – saarrrr Feb 19 '16 at 17:50