NULLIF produces incorrect output in Amazon Redshift?

Question

This following code should produce the same values for columns lag and lag2:

CREATE TABLE bug1 (
  id INT,
  value INT
);

INSERT INTO bug1 VALUES (1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3);

SELECT
  id,
  value,
         (LAG(value) OVER (PARTITION BY id ORDER BY value))     lag,
  NULLIF((LAG(value) OVER (PARTITION by id ORDER BY value)), 0) lag2
FROM bug1
ORDER BY id, value;

Indeed, running it on PostgreSQL 9 produces the expected output:

id  v  lag  lag2
1   1       
1   2   1   1
1   3   2   2
2   1       
2   2   1   1
2   3   2   2

But, running it on Redshift produces incorrect results:

id  v  lag  lag2
1   1       2
1   2   1   3
1   3   2   
2   1       2
2   2   1   3
2   3   2

Is this expected? I asked on AWS forums but no reply.

score 1 · Answer 1 · answered Jul 27 '16 at 22:07

This is not expected; looks like a bug to me. If it was intended, then using LEAD() instead and inverting the ORDER BY clause, we'd expect the exact same result as in the original LAG() query:

SELECT
  id,
  value,
         (LEAD(value) OVER (PARTITION BY id ORDER BY value desc))     lead,
  NULLIF((LEAD(value) OVER (PARTITION by id ORDER BY value desc)), 0) lead2
FROM bug1
ORDER BY id, value;

But this is what we get instead:

 id | value | lead | lead2
----+-------+------+-------
  1 |     1 |      |
  1 |     2 |    1 |     1
  1 |     3 |    2 |     2
  2 |     1 |      |
  2 |     2 |    1 |     1
  2 |     3 |    2 |     2

Yes, just ran this on my Redshift cluster. I can reproduce the error you're seeing for `LAG()` as well. — moertel, Jul 27 '16 at 22:16

NULLIF produces incorrect output in Amazon Redshift?

1 Answers1