EDIT:
I realized, in the course of attempting to explain the query, that it wouldn't always work correctly. So, I went back and figured out how to test this. I'm still bugged by the schema setup - namely, it implies that new users can't be added to an existing thread, and that a specific set of users will only be able to talk in one thread - but it was good to correct the query.
WITH Selected_Users(id) as (VALUES (@id1), (@id2), --etc--),
Threads(id) as (SELECT DISTINCT threadFk
FROM ThreadMembers as a
JOIN Selected_Users as b
ON b.id = a.userFk)
SELECT a.id
FROM Threads as a
WHERE NOT EXISTS (SELECT '1'
FROM ThreadMembers as b
LEFT JOIN Selected_Users as c
ON c.id = b.userFk
WHERE c.id IS NULL
AND b.threadFk = a.id)
AND NOT EXISTS (SELECT '1'
FROM Selected_Users as b
LEFT JOIN ThreadMembers as c
ON c.userFk = b.id
AND c.threadFk = a.id
WHERE c.userFk IS NULL)
The statement will likely have to be dynamic, to build the list of selected users, unless SQL Server has a way to provide a list as a host variable (I know DB2 does, at least from the iSeries). I don't have the perfect dataset to test this against, but against a multi-million row table (with only a many-one relationship), it returns almost instantly - I'm getting index-only access for this (hint hint).
Explanations:
WITH Selected_Users(id) as (VALUES (@id1), (@id2), --etc--),
This CTE is building the list of users so that it can be referenced as a table. This makes it easiest to deal with, although it would be possible to simply replaces it with an IN
statement everywhere (requires multiple references, though).
Threads(id) as (SELECT DISTINCT threadFk
FROM ThreadMembers as a
JOIN Selected_Users as b
ON b.id = a.userFk)
This CTE gets the list of (distinct) threads that the users are involved in. Mostly, this is just to chop the listing down to single references to threadFk
.
SELECT a.id
FROM Threads as a
... Get the selected set of threads ...
WHERE NOT EXISTS (SELECT '1'
FROM ThreadMembers as b
LEFT JOIN Selected_Users as c
ON c.id = b.userFk
WHERE c.id IS NULL
AND b.threadFk = a.id)
Where there isn't anybody 'missing' from the selected list of users - that is, it eliminates threads with user-lists that are subsets of a larger one. It also eliminates threads that have some of the users listed from the selection, but also a few that aren't, meaning that the counts of the users would match, but the actual users would not (this is where my first version failed).
EDIT:
I realized that, while the existing statement takes care of the situation where the provided list of users is a subset of users listed for a given thread, I didn't take care of the situation where the list of selected users contains a subset that is the list of users for the given thread.
AND NOT EXISTS (SELECT '1'
FROM Selected_Users as b
LEFT JOIN ThreadMembers as c
ON c.userFk = b.id
AND c.threadFk = a.id
WHERE c.userFk IS NULL)
This clause fixes that. It makes sure that there aren't any leftover users in the selection list, after excluding users for a particular thread.
The statement is now bugging me a bit - there may be a slightly better way for me to do this...
EDIT:
Muwahaha, there is a COUNT(*)
version, which should also be faster:
WITH Selected_Users(id) as (VALUES (@id1), (@id2), --etc--),
SELECT a.threadFk
FROM ThreadMembers as a
JOIN Selected_Users as b
ON b.id = a.userFk
GROUP BY a.threadFk
HAVING COUNT(*) = (SELECT COUNT(*) FROM Selected_Users)
AND COUNT(*) = (SELECT COUNT(*) from ThreadMembers as c
WHERE c.threadFk = a.threadFk)
Explanations:
SELECT a.threadFk
FROM ThreadMembers as a
JOIN Selected_Users as b
ON b.id = a.userFk
This is joining to get all threads the listed members are a part of. This is the inside equivalent to the Threads
CTE above. Actually, you could remove that CTE in the above query, too.
GROUP BY a.threadFk
We only want one instance of a given thread after all. Also (in DB2 at least), the rest of the statement isn't valid unless it's present.
HAVING COUNT(*) = (SELECT COUNT(*) FROM Selected_Users)
Verify that, for the given thread, all of the selected users are present. Or, all of the selected users must be present in the given thread.
AND COUNT(*) = (SELECT COUNT(*) from ThreadMembers as c
WHERE c.threadFk = a.threadFk)
Verify that, for the given thread, there are no non-selected users. Or, there must not be any users 'left out'
You should get index-only access for this (I seem to be). The COUNT(*)
of the result rows (for the GROUP BY
) should only be performed once, and reused. The HAVING
clause is evaluated after the GROUP BY
takes place (if I recall correctly), so the sub-select for the count from the original table should only take place once per threadFk
.