The documentation says that "NULL are considered and may be" returned by an any_value
statement. However, I am quite sure the documentation is wrong here. In the current implementation, which was tested on 13th August 2022, the any_value
will return the first value of that column. However, if the table does not have an order by
specified, the sorting may be random due to processing of the data on several nodes.
For testing a large table of nulls is needed. To generate_array
will come handy for that. This array will have several entries and the value zero for null. The first 1 million entries with value zero are generated in the table tmp
. Then table tbl
adds before and after the [-100,0,-90,-80,3,4,5,6,7,8,9]
the 1 million zeros. Finally, calculating NULLIF(x,0) AS x
replaces all zeros by null
.
Several test of any_value
using the test table tbl
are done. If the table is not further sorted, the first value of that column is returned: -100
.
WITH
tmp AS (SELECT ARRAY_AGG(0) AS tmp0 FROM UNNEST(GENERATE_ARRAY(1,1000*1000))),
tbl AS (
SELECT
NULLIF(x,0) AS x,
IF(x!=0,x,NULL) AS y,
rand() AS rand
FROM
tmp,
UNNEST(ARRAY_CONCAT(tmp0, [0,0,0,0,0,-100,0,-90,-80,3,4,5,6,7,8,9] , tmp0)) AS x )
SELECT "count rows", COUNT(1) FROM tbl
UNION ALL SELECT "count items not null", COUNT(x) FROM tbl
UNION ALL SELECT "any_value(x): (returns first non null element in list: -100)", ANY_VALUE(x) FROM tbl
UNION ALL SELECT "2nd run", ANY_VALUE(x) FROM tbl
UNION ALL SELECT "3rd run", ANY_VALUE(x) FROM tbl
UNION ALL SELECT "any_value(y)", ANY_VALUE(y) FROM tbl
UNION ALL SELECT "order asc", ANY_VALUE(x) FROM (Select * from tbl order by x asc)
UNION ALL SELECT "order desc (returns largest element: 9)", ANY_VALUE(x) FROM (Select * from tbl order by x desc)
UNION ALL SELECT "order desc", ANY_VALUE(x) FROM (Select * from tbl order by x desc)
UNION ALL SELECT "order abs(x) desc", ANY_VALUE(x) FROM (Select * from tbl order by abs(x) desc )
UNION ALL SELECT "order abs(x) asc (smallest number: 3)", ANY_VALUE(x) FROM (Select * from tbl order by abs(x) asc )
UNION ALL SELECT "order rand asc", ANY_VALUE(x) FROM (Select * from tbl order by rand asc )
UNION ALL SELECT "order rand desc", ANY_VALUE(x) FROM (Select * from tbl order by rand desc )
This gives following result:

- The first not
null
entry, -100
is returned.
- Sorting the table by this column causes the
any_value
to always return the first entry
- In the last two examples, the table is ordered by random values, thus
any_value
returns random entries
If the dataset is larger than 2 million rows, the table may be internally split to be processed; this will result in a not ordered table. Without the order by
command the first entry on the table and thus the result of any_value
cannot be predicted.
For testing this, please replace the 10th line by
UNNEST(ARRAY_CONCAT(tmp0,tmp0,tmp0,tmp0,tmp0,tmp0,tmp0,tmp0, [0,0,0,0,0,-100,0,-90,-80,3,4,5,6,7,8,9] , tmp0,tmp0)) AS x )