3

Let's take a basic deterministic function and a non-deterministic one:

  • ABS(2)
  • NOW()

What about the third case of something that may change but we're not sure, such as:

SELECT
    ABS(2) -- deterministic
  , NOW()  -- not
  , getTableCount(otherTbl) -- function that does a 'SELECT count(1) FROM table'
FROM
    table

Basically, if a row is inserted or deleted, the subselect's value will change. So would that one be considered deterministic? The result should always be the same...unless the underlying data is changed, so it's almost like a third case. Or, is volatile/non-deterministic just taken to mean 'if it ever changes, ever, ever, ever, under any circumstances, then it's volatile.' ?

David542
  • 104,438
  • 178
  • 489
  • 842
  • There is a problem with the definition of non deterministic, the real definition is that no randomness is involved. now has no randomness. Even programming random is not really non deterministic as it is pseudo rando and output can be calculated and expected – Daniel Rapaport Nov 23 '21 at 22:12
  • There's no problem with using any of those in a scalar function - is that your question? – Stu Nov 23 '21 at 22:14
  • @Stu -- no, I'm writing up some docs on udfs and want to make sure I'm clear on terminology. – David542 Nov 23 '21 at 22:15
  • Well now() is a classic non-deterministic, the other two are deterministic, ie, if the data does not change the output does not change - a count() is deterministic. – Stu Nov 23 '21 at 22:34
  • @Stu I suppose there are three ways it might be broken down (at least as used by Postgres) -- https://www.postgresql.org/docs/9.5/xfunc-volatility.html -- Immutable (ABS) > Stable (getTableCount) > Volatile (NOW()) -- if my understanding is correct on that. – David542 Nov 23 '21 at 22:37
  • @stu also I think the word `stable` that postgres uses is most in line with 'deterministic', does that seem about right? – David542 Nov 23 '21 at 22:38
  • 1
    That's really just semantics, I'd agree for this definition stable and deterministic could probably be considered synonymous. – Stu Nov 23 '21 at 22:41
  • 1
    I guess you are free to choose the interpretation that suits the logic of your code better. E.g. the SQL Server docs clearly [define](https://learn.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions?view=sql-server-ver15) deterministic function as returning "same result with specific set of input values and given the same state of the database", according to which your `getTableCount` is deterministic. – GSerg Nov 30 '21 at 20:16
  • @GSerg thanks, that makes sense. It seems there should be another definition then that produces the "same result with specific set of input values regardless of the state of the database", for example, `ABS(2)`. It may seem trivial, but it is required for something I'm doing that involves caching a data-set, and often I can make assumptions about whether the cache is correct based on various functions that do not depend on the values (or more frequently, number of rows) in a possibly externally-managed database. – David542 Nov 30 '21 at 20:17
  • 1
    @David542 I believe you have two degrees of freedom here, "deterministic / non-deterministic" and "accesses data / only accesses arguments". I don't think there's a separate name for "deterministic that does not access data", but you can apply both tags at once to have clear description. (And if it does access data, but does not change its result regardless of what that data is, then it probably does not need to access the data.) This is now CLR functions are tagged in SQL Server too (with these two separate tags). – GSerg Nov 30 '21 at 20:21
  • @GSerg thanks, could you please clarify what you mean in your last sentence: `This is now CLR functions are tagged in SQL Server too (with these two separate tags).` ? – David542 Nov 30 '21 at 20:42
  • 1
    This is *how* CLR functions are tagged. When you write one in a .NET language, you can tag it with "deterministic" and/or "accesses data" tags separately. SQL Server will then use than information for its caching behaviours. – GSerg Nov 30 '21 at 20:44
  • 1
    @David542 The last example has side effects/is stateful meaning it could return different values for the same input parameters, meaning it cannot be treated as immutable. The question is if immutable is synonym of deterministic. [Categorize Your Function as Volatile or Immutable](https://docs.snowflake.com/en/sql-reference/external-functions-best-practices.html#categorize-your-function-as-volatile-or-immutable) and [DETERMINISTIC Clause](https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/DETERMINISTIC-clause.html#GUID-6AECC957-27CC-4334-9F43-0FBE88F92654) – Lukasz Szozda Nov 30 '21 at 21:01

1 Answers1

2

There are different interpretations for determinism, even when restricted to the SQL functions domain. It depends on what determinism consumer needs and assumes.

The usual definition of determinism is that a deterministic function always return the same value when confronted with same input argument values for its parameters.

If the function consumes state, it would implictly consider it as an extra input paramenter. The original function(p1,...pn) would become function(p1,...pn,state). But in this case if two different states are compared, then the inputs would not be the same, so we couldn't talk about determinism anymore. Knowing this, we will use the terms state-sensitive-determinism and state-insensitive-determinism to differentiate those cases.

Our state-insensitive-determinism is equivalent of PostgreSQL's IMMUTABLE (PostgreSQL is a good comparinson as it avoids using the term determinism to avoid confusion, as it is possible to see in postgresql docs). In this case, the function always returns the same value no matter the state (example select 1+2). It is the most strict form of determinism and consumers usually take it for granted - query optimizers for example can substitute them by their result (select 1+2 would become select 3). In those cases, the state does not influence the result. So, even if we put state as an extra parameter, the function remains resulting the same.

When the result does not change facing the same state but risk changing otherwise we have our state-sensitive-determinism or PostgreSQL's STABLE (example select v,sum(v) over () from tbl where v>1000;). Determinism here is on a gray area. A query optimizer consumer sees it as deterministic because since query lives a well defined state, at least in transactionable databases, it is fine to calculate it only once instead of many times because future calculations would result the same. But a materialized calculated column or index can't accept this same function as deterministic because a little change in the state would turn invalid all its pre-calculated and stored values. In this scenario resides the OP's getTableCount(otherTbl). For a query optimizer its deterministism is enough to avoid extra calculations, for materialized calculated values it is not enough and can't be accepted as a source of value for being written. If we use the state as an extra parameter, the result may change between different states.

If we consume a value that is generated based on some uncontrolled state like random() (at least when we don't choose seed and pseudorandom function), then we can't achieve determinism. In PostgreSQL's terms, this would be VOLATILE. A VOLATILE is undeterministic by nature because it can have different values even in the same table scan, as it is the case of random() (For time related functions see Postgres now() timestamp doesn't change, when script works, the time may be the transaction time or can be the query time, what would impact your view of what is deterministic).

MySQL have different keywords, NOT DETERMINISTIC DETERMINISTIC, READS SQL DATA MODIFIES SQL DATA (similiar to PostgreSQL's LEAKPROOF), NO SQL CONTAINS SQL as seen on mysql docs, with the same objective of PostgreSQL - giving hints to the specific consumer, be it a query optimizer or a materialized value, of whether it would adapt its behaviour depending on its interpretation of determinism. The database vendors probably leave this responsibility to the users because leaving them the responsibility of determining the causal graph what influences what would be complex and problematic.

When vendors talk about determinsim they will probably be talking about one of those that we said. In sqlserver docs microsoft says that state must be the same, so they are probably talking about our state-sensitive-determinism. In sqlite docs otherwise it is taken the state-insensitive-determinism approach, where functions that must result equally even in different states to be considered deterministic, because they would follow stricter rules. Oracle implicitly follows the same sqlite flavor in their docs.

Our transactionable databases will eventually use some mechanism like MVCC to hold state in a transaction. In this case we could think the transactionTimestamp as a input to our functions. But if we take more complex cases like distributed databases, then our determinism can be harder to achieve and eventualy it would have to consider consensus algorithms.

brunoff
  • 4,161
  • 9
  • 10
  • 1
    thanks so much for your time. The `state-sensitive` vs `state-insensitive` is the perfect distinction. And thanks for all the links as well! – David542 Dec 03 '21 at 20:51
  • 1
    thanks for accepting the answer and thanks for correcting me, I was using `unsensitive` instead of `insensitive` – brunoff Dec 03 '21 at 21:02