SQL find identical group

Question

Given a table like:

id    key   val
----  ----  -----
bob   hair  red
bob   eyes  green

And another table like:

id    key   val
----  ----  -----
fred  hair  red
fred  eyes  green
fred  shoe  42
joe   hair  red 
joe   eyes  green
greg  eyes  blue
greg  hair  brown

I'd like to find people in table b who match people in table a exactly, in this case Bob and Joe. Fred doesn't count because he also has a shoe size. This is in Sybase so there's no full outer join. I've come up with a select of a select with a union that returns people who definitely aren't the same, but I'm not sure how to efficiently select people who are.

Alternatively, if it's simpler, how can I check which groups in a occur in b more than once?

What you're asking for is "relational division", if that will help your searches, but I can't find anything specifically for sybase, and don't work against it myself. Here's an article by Joe Celko: - [Divided we Stand](http://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/) — Damien_The_Unbeliever, Aug 01 '11 at 10:52

score 2 · Answer 1 · edited Jan 18 '21 at 12:30

2

You can emulate a full outer join by grabbing all ids in a subquery, and then left joining them in two directions:

select  ids.id
from    (
        select  distinct id
        from    @a
        union
        select  id
        from    @b
        ) as ids
left join
        @a a1
on      a1.id = ids.id
left join
        @b b1
on      a1.id = b1.id
        and a1.[key] = b1.[key]
        and a1.val = b1.val
left join
        @b b2
on      b2.id = ids.id
left join
        @a a2
on      b2.id = a2.id
        and b2.[key] = a2.[key]
        and b2.val = a2.val
group by
        ids.id
having  sum(case when b1.id is null or a2.id is null then 1 else 0 end) = 0

Example at SE DATA.

edited Jan 18 '21 at 12:30

Community

1
1

answered Aug 01 '11 at 10:33

Andomar

232,371
49
380
404

Thanks, I've managed to work around my original problem, but if someone confirms this works for them I'll accept it. – chpatrick Aug 01 '11 at 14:24
@Vahokif Andomars script is very nice, but currently bugged compared to your description, it will be hard to fix, it does act as a full join as far as i can tell. Niktrs has the best result of those 2, but it doesn't have what could be considered a full join. My script works the best, it can handle all situations, but it will be hard to convert to Sybase and it does not have a full join logic. I included it because it is compares the rows better than the other 2 examples, you should not pick my solution. If i were to choose I would choose Niktrs's script because it retuns the correct result. – t-clausen.dk Aug 01 '11 at 23:07
@t-clausen.dk: Just curious, in which case does this script return the wrong result? (You can edit my SE Data script, and save it as a new one :)) – Andomar Aug 02 '11 at 05:41
@Andomar problem is you are comparing all the fields, you were suppose to compare the sets of names – t-clausen.dk Aug 02 '11 at 05:43

niktrs · Accepted Answer · 2011-08-02T05:12:35.010

Try this

select a.id,b.id
from a 
join b on a.[key] = b.[key] and a.val = b.val -- match all rows
join (select id,count(*) total from a group by id) a2 on a.id = a2.id -- get the total keys for table a per id
join (select id,count(*) total from b group by id) b2 on b.id = b2.id -- get the total keys for table b per id
group by a.id,b.id,a2.total,b2.total
having count(*) = a2.total AND count(*) = b2.total -- the matching row's total should be equal with each tables keys per id

After @t-clausen.dk comments I made a revision of the original sql code. In this case i count each distinct pair/value that matches on both tables, with each tables distinct pair/value.

select td.aid,td.bid
from (
select a.id as aid,b.id as bid, count(distinct a.[key]+' '+a.val) total
from a 
join b on a.[kry] = b.[key] and a.val = b.val
group by a.id,b.id
) td -- match all distinct attribute rows
join (select id,count(distinct [key]+' '+val) total from a group by id) a2 on td.aid = a2.id -- get the total distinct keys for table a per id
join (select id,count(distinct [key]+' '+val) total from b group by id) b2 on td.bid = b2.id -- get the total keys for table b per id
where td.total = a2.total AND td.total = b2.total -- the matching distinct attribute total should be equal with each tables distinct key-val pair

Tested on

Table a

bob     hair    red
bob     eyes    green
nick    hair    red
nick    eyes    green
nick    shoe    45

Table b

fred    hair    red
fred    eyes    green
joe     hair    red
joe     eyes    green
fred    shoe    42

+1 Not bad, I tested it and it works as long as there are no duplicate rows in eather table, which is very likely in this example. Also it should be easy to convert to Sybase unlike my script. — t-clausen.dk, Aug 01 '11 at 16:52
using the distinct values is not a perfect solution. Imagine bob has red hair twice and joe has green eyes twice — t-clausen.dk, Aug 02 '11 at 05:48
It was not my intension to match pair/value count also. Since I have no key details I assume some uniqueness exists in (id,key) or (id,key,val) — niktrs, Aug 02 '11 at 06:05

score 1 · Answer 3 · edited Jan 18 '21 at 12:30

This syntax will find the exact matches on different names in @t1 and @t2. I appologize because is written in MSSQL. I hope it can be converted to Sybase. After playing with it all day I want to share this beauty. I know these long scripts are not popular pointwise. I hope someone will appriciate it anyway.

This select make an exact match on @t2 within @t1.

I have populated the tables in this link https://data.stackexchange.com/stackoverflow/q/108035/

DECLARE @t1 TABLE(id varchar(10), [key] varchar(10), val varchar(10))
DECLARE @t2 TABLE(id varchar(10), [key] varchar(10), val varchar(10))

;WITH t1 AS ( 
SELECT t1.id, t1.[key], t1.val, count(*) count1, sum(count(*)) OVER(PARTITION BY t1.id) sum1 FROM @t1 t1 
GROUP BY t1.id, t1.[key], t1.val
), t2 as (
SELECT t2.id, t2.[key], t2.val, count(*) count1, sum(count(*)) OVER(PARTITION BY t2.id) sum1 FROM @t2 t2 
GROUP BY t2.id, t2.[key], t2.val
), t3 AS ( 
SELECT t1.*, sum(t1.count1) OVER(PARTITION BY t1.id) sum2
FROM t1 
JOIN t2 on t1.val = t2.val AND t1.[key]=t2.[key]
AND t1.count1 = t2.count1 AND t1.sum1 = t2.sum1
)
SELECT t3.id, t3.[key], t3.val FROM t3
JOIN @t2 t ON t3.[key] = t.[key] AND t3.val = t.val
WHERE t3.sum2 = t3.sum1

Don't try the script, it doesn't contain data, use the link where the tables are populated.

SQL find identical group

3 Answers3