0

I have this query written in Legacy SQL:

select
    nth(1, a) first_a, 
    nth(1, b) first_b
from (
    select *
    from
        (select 12 a, null b),
        (select null a, 54 b)
)

As a result I was expecting one row with values (12, null), but I got (12, 54) instead. In the documentation for NTH it says:

NTH(n, field)

Returns the nth sequential value in the scope of the function, where n is a constant. The NTH function starts counting at 1, so there is no zeroth term. If the scope of the function has less than n values, the function returns NULL.

There is nothing indicating that nulls would be ignored. Is this a bug in BigQuery?

Luka
  • 2,779
  • 3
  • 17
  • 32

1 Answers1

1

This is the important part in the documentation:

in the scope of the function

The scope is normally a "record" (in legacy SQL terms), where you fetch the nth value within a repeated field. As written, though, this query has the effect of using NTH as an aggregate function. The values in the group have no well-defined order, but it so happens that NULL is ordered after the non-null values, so NTH(1, ...) gives a non-null value. Try using 2 as the ordinal instead, for instance:

select
    nth(2, a) first_a, 
    nth(2, b) first_b
from (
    select *
    from
        (select 12 a, null b),
        (select null a, 54 b)
)

This returns null, null as output.

With that said, to ensure well-defined semantics in your queries, the best option is to use standard SQL instead. Some analogues to the NTH operator when using standard SQL are:

  • The array bracket operator, e.g. array_column[OFFSET(0)] to get the first element in an array.
  • The NTH_VALUE window function, e.g. NTH_VALUE(x, 1) OVER (PARTITION BY y ORDER BY z). See also FIRST_VALUE and LAST_VALUE.
Elliott Brossard
  • 32,095
  • 2
  • 67
  • 99
  • "The values in the group have no well-defined order" - not sure about that. I can order records before calling `nth`, and it will work as expected (taking values from the first record after ordering) except for null values. – Luka Apr 10 '18 at 17:00
  • Sure, you can get consistent results, but it doesn't mean that the results are well-defined or won't change in the future. If a hash map has some particular iteration order, there is no guarantee it will always be that way, for instance. – Elliott Brossard Apr 10 '18 at 17:40
  • If you `select nth(1, b), first(b), last(b) from ...` you will get (54, 54, 54). Does it mean that each of this 3 functions has its own ordering defined which can be different from the other two? – Luka Apr 10 '18 at 18:58
  • The behavior is not well-defined. Why does it matter if it is consistent? For well defined semantics, use standard SQL in BigQuery instead, or don't use aspects of legacy SQL where the semantics are ambiguous. – Elliott Brossard Apr 10 '18 at 20:58
  • Are you telling me that those functions are there and I can use them, but I should not expect consistency? It is ok if the results are kinda random? Also results are not in accordance with the documentation, but who cares because "Why does it matter if it is consistent?" – Luka Apr 10 '18 at 21:08
  • Legacy SQL does not have well defined semantics in some areas. This is one of them. – Elliott Brossard Apr 10 '18 at 21:12
  • Ok, thanks! Then, I guess I have found a bug in Legacy SQL – Luka Apr 10 '18 at 21:15