PostgreSQL: WHERE IN and NOT WHERE IN

Question

I have two tables A and B, A referred to B by field A.id_b B.id, so that any id_b value present in B.id.

I have three queries: First one:

SELECT COUNT(b.id)
FROM B b
WHERE b.id NOT IN (
    SELECT a.id_b
    FROM A a)

It gives me 0;

Second one, difference only in NOT:

SELECT COUNT(b.id)
FROM B b
WHERE b.id IN (
    SELECT a.id_b
    FROM A a)

This query gives me: 1899

Third one:

SELECT COUNT(b.id)
FROM B b

And this query gives me 3599

SAME result in:

SELECT a.id_b
FROM A a

guaranty me spliting B.id on two sets, and count of elements in both sets must much total elements count, but I have: 1899 + 0 != 3599.

How that could be?

Yes, unique. Primary key. And A.id_b id foreign key, with NOT NULL constraint. — devdRew, Oct 15 '12 at 09:11
Side note: The last query does not have `COUNT` function. Is it intentionally? — Himanshu, Oct 15 '12 at 09:17
`A.id_b`, Two table, having for one column in each - `A: [id_b]` and `B: [id]`. — devdRew, Oct 15 '12 at 09:26
Instead of editing a question. You should (must) add your own answer and accept it. — Himanshu, Oct 15 '12 at 09:58

score 4 · Accepted Answer · answered Oct 15 '12 at 11:24

4

Found the reason. There was records in A, which has NULL values in A.id_b. That's why query:

SELECT COUNT(b.id)
FROM B b
WHERE b.id NOT IN (
    SELECT a.id_b
    FROM A a)

was returning 0.

answered Oct 15 '12 at 11:24

devdRew

4,393
3
24
33

score 3 · Answer 2 · answered Oct 15 '12 at 09:01

3

This query tells us that table B has total of 3599 rows:

SELECT COUNT(b.id)
FROM B b

Next query tells us that every single one id from B was used in A:

SELECT COUNT(b.id)
FROM B b
WHERE b.id NOT IN (
    SELECT a.id_b
    FROM A a)

Further, this query tells is that table B has 1899 id's that are mentioned in table A:

SELECT COUNT(b.id)
FROM B b
WHERE b.id IN (
   SELECT a.id_b
   FROM A a)

The only possible explanation to this is that some B.id were used in table A as A.id_b more than once per row. Running this query will show list of all duplicate mentions of id_b in table A:

SELECT a.id_b
FROM A a
GROUP BY a.id_b
HAVING count(a.id_b) > 1

answered Oct 15 '12 at 09:01

mvp

111,019
13
122
148

Thanks for answer. Well, yeah, rows from B using more then ones in table A. But how that effect on `NOT IN` selection? I have `0` B rows, which is not used in A and `1899` rows which is used. Where is the rest of? – devdRew Oct 15 '12 at 09:13
In this query SELECT COUNT(b.id) FROM B b WHERE b.id IN ( SELECT a.id_b FROM A a) IN (...) implicitly run unique selection. And it only found 1899 unique ones – mvp Oct 15 '12 at 09:19
And that's right, my question is where is the rest of elements from B, why they not appearing in `NOT IN` query. – devdRew Oct 15 '12 at 09:25
1

Well, we can only conclude that 1700 'missing' B ids were never mentioned in table A. Is it possible that B has nulls? – mvp Oct 15 '12 at 09:38
Yes, but **WHY** the query `SELECT COUNT(b.id) FROM B b WHERE b.id NOT IN ( SELECT a.id_b FROM A a)` gives me 0 – devdRew Oct 15 '12 at 09:40
Thanks, found the solution. Updated my question with it. – devdRew Oct 15 '12 at 09:55
1

If B has nulls this can explain everything. In SQL, NULL handing can be considered weird. For example, nothing can be equal to NULL, even NULL itself. And NULL can never be member of any subset. And it cannot be NOT member of any subset. Strange, isn't it? :-) – mvp Oct 15 '12 at 10:07

PostgreSQL: WHERE IN and NOT WHERE IN

2 Answers2