I was working with a partner on pulling some data from our data warehouse, and for some reason, we were getting different results from our queries when they seem to be logically equivalent.
I've attached a simplified and anonymized version of the queries we have performed. They all perform the query, filtering the results using:
NOT IN
SELECT COUNT(DISTINCT organization_id)
FROM clients.organization
WHERE organization_id NOT IN (
SELECT owner_organization_id
FROM clients_enhanced.organization_team
);
LEFT OUTER JOIN ... WHERE NULL
SELECT COUNT(DISTINCT o.organization_id)
FROM clients.organization AS o
LEFT OUTER JOIN (
SELECT DISTINCT owner_organization_id AS organization_id
FROM clients_enhanced.organization_team
) AS sub
ON o.organization_id = sub.organization_id
WHERE sub.organization_id IS NULL
;
EXCEPT
SELECT COUNT(distinct sub.organization_id)
FROM (
SELECT o.organization_id AS organization_id
FROM clients.organization AS o
EXCEPT
SELECT DISTINCT owner_organization_id
FROM clients_enhanced.organization_team
) AS sub
;
These queries yield the following results:
NOT IN
count
-------
0
(1 row)
LEFT OUTER JOIN ... WHERE NULL
count
--------
372421
(1 row)
EXCEPT
count
--------
372421
(1 row)
Are the queries above not logically equivalent? Or am I missing something completely?
For reference, these queries were performed on Amazon Redshift