I've searched for this and can't seem to find an answer. I apologize in advance as I'm sure this answer is out there, but I can't seem to find it.
I'm working with a SQL Server 2005 DB, and I'm aware that the query below doesn't represent a normalized DB, since the numPlacements field is in both the detail and rollup table. I didn't create the DB.
The below SQL gives the expected result when the where clause is used. The expected result is all rows where a matching value is missing from either table, or the two values don't match.
However, if I comment the where clause and uncomment the final AND in the ON clause, it returns over 200k rows instead of the expected 120 results.
SELECT CASE WHEN A.ID is NULL THEN B.ID ELSE A.ID END,
A.numPlacements AS 'AnumPlacements',
B.numPlacements AS 'bnumPlacements',
B.numPlacements - A.numPlacements as 'Variance'
FROM (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM PlacementDetailLevel
GROUP BY ID) A
FULL OUTER JOIN (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM PlacementRollupLevel
GROUP BY ID) B
ON A.ID = B.ID
--AND B.numPlacements <> A.numPlacements
WHERE A.numPlacements <> B.numPlacements or A.numPlacements is null or B.numPlacements is null
Any ideas as to why?
More detail below based on ypercube's suggestion:
I created TableA and TableB. They look like this:
TableA
ID numPlacements
1 10
2 20
3 30
4 40
TableB
ID numPlacements
2 20
3 31
4 40
5 50
Note that the differences are TableA has no #5, TableB has no #1, and #3 has a different numPlacements in both.
SELECT CASE WHEN A.ID is NULL THEN B.ID ELSE A.ID END AS 'ID',
A.numPlacements AS 'AnumPlacements',
B.numPlacements AS 'BnumPlacements',
B.numPlacements - A.numPlacements as 'Variance'
FROM (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM TableA
GROUP BY ID) A
FULL OUTER JOIN (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM TableB
GROUP BY ID) B
ON A.ID = B.ID
The above produces exactly what I'd expect:
ID AnumPlacements BnumPlacements Variance
1 10 NULL NULL
2 20 20 0
3 30 31 1
4 40 40 0
5 NULL 50 NULL
Let's try adding the WHERE clause.
SELECT CASE WHEN A.ID is NULL THEN B.ID ELSE A.ID END AS 'ID',
A.numPlacements AS 'AnumPlacements',
B.numPlacements AS 'BnumPlacements',
B.numPlacements - A.numPlacements as 'Variance'
FROM (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM TableA
GROUP BY ID) A
FULL OUTER JOIN (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM TableB
GROUP BY ID) B
ON A.ID = B.ID
WHERE A.numPlacements <> B.numPlacements or A.numPlacements is null or B.numPlacements is null
With the where, we get the three expected rows:
ID AnumPlacements BnumPlacements Variance
1 10 NULL NULL
3 30 31 1
5 NULL 50 NULL
Let's try adding the AND.
SELECT CASE WHEN A.ID is NULL THEN B.ID ELSE A.ID END AS 'ID',
A.numPlacements AS 'AnumPlacements',
B.numPlacements AS 'BnumPlacements',
B.numPlacements - A.numPlacements as 'Variance'
FROM (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM TableA
GROUP BY ID) A
FULL OUTER JOIN (SELECT ID,
Sum(numPlacements) AS 'numPlacements'
FROM TableB
GROUP BY ID) B
ON A.ID = B.ID
AND B.numPlacements <> A.numPlacements
Now, if we try it with the above AND in the join, I would expect to get row #3. Instead, I get this:
ID AnumPlacements BnumPlacements Variance
1 10 NULL NULL
2 NULL 20 NULL
2 20 NULL NULL
3 30 31 1
4 NULL 40 NULL
4 40 NULL NULL
5 NULL 50 NULL