2

I try a query which test NULL NOT IN Empty_Relation on Postrgresql, Spark and I got different results.

select count(*) from
(select 1)
where null not in
(a empty relation)

Postgresql outputs 1. The other outputs 0.

I understand the NULL behaviour of NOT IN, but my the subquery is empty relation, this situation seems more interesting. There are a lot of posts discussing NOT IN but I don't find anything related to NOT IN Empty_Relation.

So my question is more like does ANSI SQL define this behavior or this is actually a grey area, both answers could be accepted.

Bostonian
  • 615
  • 7
  • 16

2 Answers2

4

tl;dr: PostgreSQL is correct.

This is what the SQL specification says about this behavior:

4) The expression RVC NOT IN IPV is equivalent to NOT ( RVC IN IPV )

5) The expression RVC IN IPV is equivalent to RVC = ANY IPV

So, NULL NOT IN (<empty relation>) is equivalent to NOT (NULL = ANY (<empty relation>))

Then, it goes on to say:

The result of R <comp op> <quantifier> T is derived by the application of the implied <comparison predicate> R <comp op> RT to every row RT in T.

[...]

d) If T is empty or if the implied <comparison predicate> is False for every row RT in T, then R <comp op> <some> T is False.

(Note: <some> is either ANY or SOME -- they both mean the same).

By this rule, since T is empty, NULL = ANY (<empty>) is False, so NOT (NULL = ANY (<empty relation>) is True.

Martin Traverso
  • 4,731
  • 15
  • 24
  • To me it's helpful to say that NULL means "undefined". When I asked whether "some undefined value" is `IN (... empty relation ...)`, there is no doubt it _isn't_ there. – Piotr Findeisen Nov 04 '19 at 08:36
2

I'm pretty sure Postgres is correct.

Although almost every comparison with NULL returns NULL, you have found an exception. If the set is empty, then nothing is in the set. That is, any value is NOT in the set regardless of the value.

Remember, the semantics of NULL mean "unknown" value -- not missing value. "Unknown" means that it can take on any value. The expression <anything> not in (<empty set>) is true, regardless of the value of <anything>.

Incidentally, Postgres is not alone in this behavior. A cursory look shows that SQL Server and Oracle also return 1 for equivalent queries.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786