When I use FIRST_VALUE on a data set that I construct by hand I get one result, and when I use it on a data set that results from a left join, I get a different result - even though the data sets appear to me to contain the exact same data values. I've reproduced the issue with a simple data set below.
Can someone tell me if I have misunderstood something?
This SQL produces the expected result, that FIRST_VALUE is NULL and LAST_VALUE is 30.
SELECT
agroup,
aval,
FIRST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fv,
LAST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv
FROM
(
SELECT 1 agroup, 10 aval
UNION ALL SELECT 1, NULL
UNION ALL SELECT 1, 30
) T
This SQL uses a LEFT JOIN that results in the same data set as above, but FIRST_VALUE appears to ignore the NULL.
SELECT
agroup,
aval,
FIRST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fv,
LAST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv
FROM
(
SELECT
T1.agroup,
T1.akey,
T2.aval
FROM
(
SELECT 1 agroup, 1 akey
UNION ALL SELECT 1, 2
UNION ALL SELECT 1, 3
) T1
LEFT JOIN
(
SELECT 1 akey, 10 aval
UNION ALL SELECT 3,30
) T2 ON T1.akey = T2.akey
) T
I can also show that the left join behavior is different when using a table variable vs. a CTE. When using a CTE to generate the data, FIRST_VALUE ignores the NULL. Using the exact same SQL but putting the results in a table variable or a temporary table results in the NULL being taken into account.
With a CTE the SQL Server results don't include NULL in the FIRST_VALUE determination:
WITH T AS
(
SELECT
T1.agroup,
T1.akey,
T2.aval
FROM
(
SELECT 1 agroup, 1 akey
UNION ALL SELECT 1, 2
UNION ALL SELECT 1, 3
) T1
LEFT JOIN
(
SELECT 1 akey, 10 aval
UNION ALL SELECT 3,30
) T2 ON T1.akey = T2.akey
)
SELECT
agroup,
aval,
FIRST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fv,
LAST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv
FROM
T
But with a table variable, it does:
DECLARE @T TABLE (agroup INT,akey INT,aval INT)
INSERT INTO
@T
SELECT
T1.agroup,
T1.akey,
T2.aval
FROM
(
SELECT 1 agroup, 1 akey
UNION ALL SELECT 1, 2
UNION ALL SELECT 1, 3
) T1
LEFT JOIN
(
SELECT 1 akey, 10 aval
UNION ALL SELECT 3,30
) T2 ON T1.akey = T2.akey
SELECT
agroup,
aval,
FIRST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fv,
LAST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv
FROM
@T