2

Let's take the following two queries:

select greatest(1, val) from unnest([null]) as val
# null

And:

select sum(val) from unnest([1, null]) as val
# 1

My question is why is null handled differently between aggregate functions and normal scalar functions? Of course, the second query is more useful when null values are not counted. But I would think the first one would be more useful as well if it were to return 1 instead of null. If null is an 'unknown value', then wouldn't both functions have a supposedly unknown answer (i.e., null), not just the latter?

If there is a historical reason for this it would be great to know as well. Hopefully someone can shine some light on why there's a difference between the two.

Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
David542
  • 104,438
  • 178
  • 489
  • 842
  • 1
    Basically because that's what the SQL standard decided back in the day I think. When ANSI Warnings are on SQL Server will print a message telling you NULLs are ignored in the aggregation. I'm assuming they thought this would be more useful in practice even if logically somewhat inconsistent. – Martin Smith Jun 09 '23 at 23:27

2 Answers2

4

The aggregate functions are supposed to skip null values. Here is an excerpt from SQL-92 standard:

<set function specification>

[...]

Otherwise, let TX be the single-column table that is the result of applying the <value expression> to each row of T and eliminating null values. If one or more null values are eliminated, then a completion condition is raised: warning- null value eliminated in set function.

The greatest function on the other hand should return null if any argument is null¹.

My question is why is null handled differently between aggregate functions and normal scalar functions?

Generally speaking, scalar functions return null if any of their arguments is null². Here is Ask Tom's take on this exact same issue, I personally agree with everything that is said in the answer:

and Tom said...

pretty much all single row functions return NULL if any of their inputs to compare are null.

ops$tkyte%ORA10GR2> select round( 1.2, null ) from dual;

ROUND(1.2,NULL)
---------------
NULL

when you ask "what is the greatest of 1, NULL, 2 - the answer is "we don't know, because NULL is unknown"

Aggregates are defined to "skip nulls" (ANSI says so)

But functions in general that take a set of inputs, will return NULL when a deciding input is NULL.

http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions001.htm#i88893

If you call a SQL function with a null argument, then the SQL function automatically returns null

¹ Different implementations of the greatest function existed long before it was standardized. MySQL and Oracle return null if any argument is null; while PostgreSQL and SQL Server ignore null values.

² Conversely, vendor specific scalar functions exist that skip null values. One example is concat_ws (MySQL, PostgreSQL, SQL Server) function that is designed to simplify concatenation of (possibly null) strings using a separator.

Salman A
  • 262,204
  • 82
  • 430
  • 521
  • GREATEST has been included in the SQL standard for several weeks now! SQL-2023 has it, and LEAST too. https://en.wikipedia.org/wiki/SQL:2023 – jarlh Jun 21 '23 at 21:43
  • great, unfortunately specs are no available for free (afaik) so cannot say which of the above two behaviors is standard, – Salman A Jun 21 '23 at 21:47
  • If any of its values is null, null is returned. – jarlh Jun 21 '23 at 21:49
  • 1
    Sad for SQL Server and PostgreSQL. I'll revise the answer in a few hours. Thanks for correction. – Salman A Jun 21 '23 at 21:54
1

Without specific citation, reason may be due to a comparison function like GREATEST versus a computing function like SUM.

In ANSI SQL, comparing any value (using operators such as <, >, =, <>) to NULL should always return unknown or NULL. Interestingly, different RDBMSs may not by default follow this rule completely so WHERE my_column <> NULL can retun non-zero rows. See discussion in SQL Server's SET ANSI_NULLS mode. Also, see Postgres' transform_null_equals mode.

On other hand, also in ANSI SQL, SUM like most aggregate functions except COUNT(*) ignore NULLs. Hence, you observe the different behaviors.

Parfait
  • 104,375
  • 17
  • 94
  • 125