I've seen a great many different forms of left anti-semi join. Allow me to list and name every one that comes to mind. The following queries are intended to return every ROSTER_ID
that's not used by any employee and who is the owner of that roster.
--1) NOT EXISTS, with a particular column selected in the subquery
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT EMP_ID FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
--2) NOT EXISTS, with a particular column selected in the subquery and TOP (1) used
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT TOP (1) EMP_ID FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
--3) NOT EXISTS, with all data selected in the subquery
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT * FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
--4) NOT EXISTS, with all columns selected in the subquery and TOP (1) used
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT TOP (1) * FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
--5) NOT EXISTS, but just use SELECT 1
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE NOT EXISTS (SELECT 1 FROM EMPLOYEES WHERE EMP_ID = ROSTERS.EMP_ID)
--6) NOT IN
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
WHERE EMP_ID NOT IN (SELECT EMP_ID FROM EMPLOYEES)
--7) LEFT JOIN
SELECT ROSTER_ID, ROSTER_OWNER
FROM ROSTERS
LEFT OUTER JOIN EMPLOYEES
ON EMPLOYEES.EMP_ID = ROSTERS.EMP_ID
WHERE EMPLOYEES.EMP_ID IS NULL
My question is this: Is there any objective reason - e.g. performance, backwards compatibility, portability, NULL-handling, ease of testing, extensibility, etc - to prefer any particular way of doing a left anti-semi join? I'm also interested to hear subjective reasons, e.g. style concerns or clarity, but only including them would be a non-answer.
My own research only points to the below, but it's all weak and probably subjective:
- Microsoft's U-SQL documentation hints that they prefer to use the NOT IN version (#6 of mine) in T-SQL.
NULL
handling withIN
is always cause for concern, which is a mild reason to preferEXISTS
over it.- If you're seriously concerned about backwards compatibility, then I think my LEFT JOIN syntax didn't work in the 1980's.
- Some people like to use
EXCEPT
, but I don't think that it generalises to cases where a column appears in only the outermostSELECT
.