2

I have a trouble using LAST_VALUE() window function in Google BigQuery.

In my understanding, the following two columns should return the same results, but actually they return different results and it seems the one with FIRST_VALUE() is correct.

SELECT
  FIRST_VALUE(status) OVER (PARTITION BY userId ORDER BY timestamp DESC), 
  LAST_VALUE(status) OVER (PARTITION BY userId ORDER BY timestamp ASC)
FROM
  [table]

Did I make any mistake?

Sherantha
  • 462
  • 3
  • 19
yuzwyy
  • 87
  • 6
  • No null or duplicate. To be worse, the one using LAST_VALUE returns different results for the same userId. – yuzwyy Jan 21 '16 at 09:19
  • it's a feature, not bug.. check the answer below – Felipe Hoffa Jan 21 '16 at 09:35
  • You have a low rate. Important on SO, you have to mark accepted answers by using the tick on the left of the posted answer, below the voting. This will increase your rate. See how this works by visinting this link: http://meta.stackoverflow.com/questions/5234/how-does-accepting-an-answer-work#5235 – Pentium10 Jan 21 '16 at 11:44
  • Possible duplicate of [SQL: Last\_Value() returns wrong result (but First\_Value() works fine)](https://stackoverflow.com/questions/15388892/sql-last-value-returns-wrong-result-but-first-value-works-fine) – Harish Sripathi Nov 20 '19 at 15:14

1 Answers1

2

There's a subtlety on how OVER() functions work when they have an (ORDER BY): They work incrementally.

See this query:

SELECT x, y, 
       FIRST_VALUE(x) OVER(ORDER BY y) first, 
       LAST_VALUE(x) OVER(ORDER BY y DESC) last,
       SUM(x) OVER() plain_sum_over, 
       SUM(x) OVER(ORDER BY y) sum_over_order
FROM (SELECT 1 x, 1 y),(SELECT 2 x, 2 y),(SELECT 3 x, 3 y),(SELECT 4 x, 4 y)

enter image description here

The plain_sum_over and sum_over_order reveal the secret: With an ORDER BY you get incremental results - and that's what you are witnessing in your results.

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
  • Thanks! I understand SUM() works incrementally, but LAST_VALUE() also works incrementally? If I want to get the last status of each user, what is the best way? – yuzwyy Jan 21 '16 at 10:03
  • 2
    OK. I think now I understood the behavior. So, when you have ORDER BY in OVER clause without window frame clause, the default window frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Is this correct? – yuzwyy Jan 21 '16 at 10:29
  • For the additional question, please start a new question - and accept this answer if it answers the original one. – Felipe Hoffa Jan 21 '16 at 11:05