0

I was working with a partner on pulling some data from our data warehouse, and for some reason, we were getting different results from our queries when they seem to be logically equivalent.

I've attached a simplified and anonymized version of the queries we have performed. They all perform the query, filtering the results using:

NOT IN

SELECT COUNT(DISTINCT organization_id)
FROM clients.organization
WHERE organization_id NOT IN (
  SELECT owner_organization_id 
  FROM clients_enhanced.organization_team
);

LEFT OUTER JOIN ... WHERE NULL

SELECT COUNT(DISTINCT o.organization_id)
FROM clients.organization AS o
LEFT OUTER JOIN (
  SELECT DISTINCT owner_organization_id AS organization_id
  FROM clients_enhanced.organization_team
  ) AS sub
ON o.organization_id = sub.organization_id
WHERE sub.organization_id IS NULL
;

EXCEPT

SELECT COUNT(distinct sub.organization_id)
FROM (
  SELECT o.organization_id AS organization_id
  FROM clients.organization AS o

  EXCEPT

  SELECT DISTINCT owner_organization_id
  FROM clients_enhanced.organization_team
) AS sub
;

These queries yield the following results:

NOT IN

 count
-------
     0
(1 row)

LEFT OUTER JOIN ... WHERE NULL

 count
--------
 372421
(1 row)

EXCEPT

 count
--------
 372421
(1 row)

Are the queries above not logically equivalent? Or am I missing something completely?

For reference, these queries were performed on Amazon Redshift

kamil1
  • 68
  • 9
  • 1
    Can `owner_organization_id` be null? If yes, you need to exclude `NULL` values in the sub-select for the `NOT IN` solution –  Jun 22 '16 at 20:55
  • Yes it can be `NULL`. And yes, excluding `NULL` in the `NOT IN` subquery does return the correct result. I'm assuming this is because the DBMS checks to see if each value in the subquery is `<>` to whatever you are checking against, and when comparing against a `NULL` value get's `false`? Regardless, if you post this comment as the answer, I will mark it as the solution. – kamil1 Jun 22 '16 at 22:33
  • Say why you think they are equivalent with justification referencing authoritative documentation, or you are just asking for yet another language definition with a bespoke tutorial & are not giving us any idea what your current misconceptions are. Also: Tthis is a faq. . Please before considering posting read your textbook and/or manual & google any error message or many clear, concise & precise phrasings of your question/problem/goal, with & without your particular strings/names & site:stackoverflow.com & tags; read many answers. Also: Please for code questions give a [mre]. – philipxy Sep 29 '20 at 20:40

1 Answers1

0

Likely because Column was not prefixed with a table alias, it got confused.

SELECT COUNT(DISTINCT organization_id)
FROM clients.organization
WHERE organization_id NOT IN (
  SELECT c. owner_organization_id 
  FROM clients_enhanced.organization_team as c
);
NearHuscarl
  • 66,950
  • 18
  • 261
  • 230
Nathan
  • 1