SQL: select sets containing exactly given members

Question

I'm sure there is a proper word for this which I fail to remember, but the problem is easy to describe: I have a table groupmembers, which is a simple relationship between groups and members:

id | groupid | memberid
1  | g1      | m1
2  | g1      | m2
3  | g2      | m1
4  | g2      | m2
5  | g2      | m3

Above describing two groups, one with m1 and m2 and one with m1,m2 and m3. If I want to select groupids which has members m1,m2 but no other members, how do I do it? The approaches I have tried would also return g2, as m1 and m2 is a subset of them.

UPDATE: Wow, some great answers! Let me first clarify my question a little - I want to be able to select the group that exactly matches the given members m1 and m2. So, it should NOT match if the group also contains more members than m1 and m2, and it should NOT match if the group contains less than members m1 and m2.

I had a similar question once, perhaps yours doesn't need the CSV parsing element but still seems relevant: http://stackoverflow.com/questions/7492699/how-can-i-structure-a-query-to-give-me-only-the-rows-that-match-all-values-in-a. Hope it helps... — Gibron, Oct 08 '12 at 06:19
The phrase you're looking for is "relational division" - there are a number of forms, some of which would produce g1 and g2, some would only produce g1. — Damien_The_Unbeliever, Oct 08 '12 at 06:29
@RichardTheKiwi: I think this can be solved with standard SQL as John Woo's and my answers have shown. — , Oct 08 '12 at 06:57

John Woo · Accepted Answer · 2012-10-08T06:21:39.577

11

from your phrase

I want to select groupids which has members m1,m2 but no other members

try this one, the idea behind is to count the total instances of records that match the condition and the where clause and that it is equal to the total number of records per group.

SELECT groupid
FROM table1 a
WHERE memberid IN ('m1','m2')
GROUP BY groupid
HAVING COUNT(*) = 
(
  SELECT COUNT(*)
  FROM table1 b
  WHERE b.groupid = a.groupid
  GROUP BY b.groupID
)

SQLFiddle Demo

edited Oct 08 '12 at 06:21

answered Oct 08 '12 at 06:14

John Woo

258,903
69
498
492

Unfortunately, when I test this on the SQLfiddle it will match groups that contains only one of the given member ids. So if I change the first row of the table from (1, 'g1', 'm1') to (1, 'g3', 'm1'), it will match both g1 and g3 (each having one of the members). I updated the question to clarify that I need it to exactly match the given members. Thanks! – RipperDoc Oct 08 '12 at 10:30
1

@RipperDoc: to filter out those that have fewer members an additional `and count(*) = 2` would be needed. – Oct 08 '12 at 10:53
8 years later, still find this very helpful. Should the answer be updated with `and count(*) = 2`? – Philip Borbon Sep 25 '20 at 08:32
What is the purpose of the `GROUP BY b.groupID` in the subselect? Could this query be written without that line? – ulysses_rex Jun 15 '21 at 13:07

score 4 · Answer 2 · 2012-10-08T07:02:54.127

4

You are looking for the intersection between those groups that have m1 and m2 and those groups that have exactly two members. SQL has an operator for that:

select groupid
from group_table
where memberid in ('m1','m2')
group by groupid 
having count(distinct memberid) = 2
intersect
select groupid 
from group_table
group by groupid 
having count(distinct memberid) = 2

(If you are using Oracle, intersect is called minus)

Here is a SQLFiddle demo: http://sqlfiddle.com/#!12/df94d/1

Although I think John Woo's solution could be more efficient in terms of performance.

edited Oct 08 '12 at 07:02

answered Oct 08 '12 at 06:55

Well, it's almost "standard". I only use the term when I can get it to run on at least Oracle, MySQL, SQLServer, Postgresql. Sybase and SQLite would be bonuses. MySQL doesn't support `INTERSECT`, and it is only the most installed DBMS in existence. – RichardTheKiwi Oct 08 '12 at 07:08
@RichardTheKiwi: the `intersect` operator is defined in the SQL (ANSI) standard. MySQL is stuck in the 90's when it comes to modern SQL features so it's no wonder it doesn't have it (like so many other things) – Oct 08 '12 at 07:09
This seems to do exactly what I needed, thanks! Luckily, my current DB supports INTERSECT. But would there be a more optimal way of doing it? – RipperDoc Oct 08 '12 at 10:49
@RipperDoc: John Woo's solution is most probably more efficient. It can be changed to also deal with groups having fewer members. – Oct 08 '12 at 10:54

score 2 · Answer 3 · answered Jul 09 '13 at 16:12

there is an issue with this query

SELECT groupid
FROM table1 a
WHERE memberid IN ('m1','m2')
GROUP BY groupid
HAVING COUNT(*) = 
(
  SELECT COUNT(*)
  FROM table1 b
  WHERE b.groupid = a.groupid
  GROUP BY b.groupID
)

It will match groups with m1 only or m2 only. For that we can add another count check

SELECT groupid
FROM table1 a
WHERE memberid IN ('m1','m2')
GROUP BY groupid
HAVING COUNT(*) = 2 --since we already know we should have exactly two rows
AND COUNT(*) = 
(
  SELECT COUNT(*)
  FROM table1 b
  WHERE b.groupid = a.groupid
  GROUP BY b.groupID
)

RichardTheKiwi · Answer 4 · 2012-10-08T07:21:34.860

-- sample table for discussion
CREATE TABLE tbl
  (id int, groupid varchar(2), memberid varchar(2));
INSERT INTO tbl
  (id, groupid, memberid)
VALUES
    (6, 'g4', 'm1'),
    (7, 'g4', 'm2'),
    (8, 'g6', 'm1'),
    (9, 'g6', 'm3'),
    (1, 'g1', 'm1'),
    (2, 'g1', 'm2'),
    (3, 'g2', 'm1'),
    (4, 'g2', 'm2'),
    (5, 'g2', 'm3')
;

-- the query
select a.groupid, b.groupid peer
from (select groupid, count(*) member_count, min(memberid) x, max(memberid) y
      from tbl
      group by groupid) A
join
     (select groupid, count(*) member_count, min(memberid) x, max(memberid) y
      from tbl
      group by groupid) B
  on a.groupid<b.groupid and a.member_count=b.member_count and a.x=b.x and a.y=b.y
join tbl A1
  on A1.groupid = A.groupid
join tbl B1
  on B1.groupid = B.groupid and A1.memberid = B1.memberid
group by A.groupid, b.groupid, A.member_count
having count(1) = A.member_count;

-- the result
GROUPID PEER
g1  g4

The above shows a way to get groups listed with their peers, in a highly optimal way. It works well with large databases by decomposing the groups into member counts and takes along the min and max. The groups are quickly pared down using a direct join, and only for the remaining matches is the full table consulted joining back on group ids A and B to finally determine if they are equivalent groups.

If you had 3 similar groups (101,103,104), the sets will appear as three separate rows (101,103),(101,104),(103,104) - because each pair forms a peering, so such a query is best used if you already know one of the groups that you want to find peers for. This filter would fit into the first subquery.

I don't understand what that "peer" group is supposed to be. — , Oct 08 '12 at 07:12
I'm solving a different question, it seems. I thought the question was to find equivalent groups in terms of membership. — RichardTheKiwi, Oct 08 '12 at 07:14
I realize that, and I would like to understand what "equivalent" means here. I don't understand your example results: why do the three groups (1,3,5) produce (1,4), (1,5), (4,5)? How does group 4 come into your result? — , Oct 08 '12 at 07:19
That's why I said `if you had 3 similar groups...` instead of `you can see 3 similar groups...` — RichardTheKiwi, Oct 08 '12 at 07:20
Ah, your edit makes more sense. The first example was a typo then. — , Oct 08 '12 at 07:26
If I understand correctly, you are showing a way to list which groups in the database that would match eachother exactly? That's neat! In this case however I am bound to only use input as a set of member_ids. The idea is to find whether a set of members already have a group defined, or if we ought to create one. Thanks in any case!! — RipperDoc, Oct 08 '12 at 10:48

score 1 · Answer 5 · answered Oct 08 '12 at 07:02

1

SELECT DISTINCT                          -- if (groupid, memberid) is unique
                                         -- no need for the DISTINCT
    a.groupid
FROM 
    tableX AS a
  JOIN
    tableX AS b  
      ON b.groupid = a.groupid 
WHERE a.memberid = 'm1' 
  AND b.memberid = 'm2'
  AND NOT EXISTS
      ( SELECT *
        FROM tableX AS t
        WHERE t.groupid = a.groupid
          AND t.memberid NOT IN ('m1', 'm2') 
      ) ;

answered Oct 08 '12 at 07:02

ypercubeᵀᴹ

113,259
19
174
235

This seems to produce the right output, but what if the list of members to match is of arbitrary length, how would I change the expression? – RipperDoc Oct 08 '12 at 10:35
In that case, you either need more self-joins (as many as the items, 2 in this case) or a general solution, like the ones provided by a_horse, John_Woo and Richard. – ypercubeᵀᴹ Oct 08 '12 at 11:12

score 0 · Answer 6 · edited Mar 20 '14 at 12:30

0

id | groupid | memberid
1  | g1      | m1
2  | g1      | m2
3  | g2      | m1
4  | g2      | m2
5  | g2      | m3

select GRPID from arcv where GRPID in (
select GRPID from arcv  
group by GRPID  having count(1)=2) and memberid in ('m1','m2')

edited Mar 20 '14 at 12:30

codemania

1,098
1
9
26

answered Mar 20 '14 at 12:11

user3142652

1
1

SQL: select sets containing exactly given members

6 Answers6

SQLFiddle Demo

Linked