I am unable to understand this.
SELECT COUNT(*) FROM profiles
WHERE profiles.status IN ('abc', 'man')
AND profiles.id IN (
SELECT artifacts.item_id FROM artifacts
WHERE artifacts.deleted_at IS NULL
AND artifacts.item_type = 'Profile'
AND artifacts.upload_type = 'bill'
);
count
-------
12514
(1 row)
Above query counts duplicate records of profiles (for which artifacts have multiple records). When I run the above query with distinct I get correct count which is below.
SELECT COUNT(DISTINCT(id)) FROM profiles
WHERE profiles.status IN ('abc', 'man')
AND profiles.id IN (
SELECT artifacts.item_id FROM artifacts
WHERE artifacts.deleted_at IS NULL
AND artifacts.item_type = 'Profile'
AND artifacts.upload_type = 'bill'
);
count
-------
12157
(1 row)
Artifacts can have more than one records for same profile. But as per my understanding IN
query will not let any duplicate profiles to come in count. Am I right? or is there any thing I am missing?
UPDATE:
I tried to reduce the query to 2 different filtering conditions. Both conditions works fine. Please find below.
=> SELECT COUNT(*) FROM profiles WHERE profiles.id IN (
SELECT artifacts.item_id FROM artifacts
WHERE artifacts.deleted_at IS NULL
AND artifacts.item_type = 'Profile'
AND artifacts.upload_type = 'bill');
count
-------
22664
(1 row)
=> SELECT COUNT(DISTINCT(id)) FROM profiles WHERE profiles.id IN (
SELECT artifacts.item_id FROM artifacts
WHERE artifacts.deleted_at IS NULL
AND artifacts.item_type = 'Profile'
AND artifacts.upload_type = 'bill');
count
-------
22664
(1 row)
=> SELECT COUNT(DISTINCT(id)) FROM profiles
WHERE profiles.status IN ('abc', 'man');
count
-------
20109
(1 row)
=> SELECT COUNT(*) FROM profiles
WHERE profiles.status IN ('abc', 'man');
count
-------
20109
So duplication occurs when two IN
queries used in conjuction. Is any one familiar with such use case.