In SQL, create pairs based on a grouping and count their frequency

Question

What I want is to create DISTINCT pairs of column 2 for each ID and arrange them by count.
Let's use this table as an example:

CREATE TABLE mytable
    (`ID` int, `C2` varchar(1), `C3` varchar(2))
;
    
INSERT INTO mytable
    (`ID`, `C2`, `C3`)
VALUES
    (1, 'A',' a1'),
    (1, 'B', 'b1'),
    (2, 'A', 'a2'),
    (3, 'A', 'a3'),
    (3, 'C', 'c3'),
    (3, 'A', 'a4'),
    (4, 'A', 'a1'),
    (4, 'B', 'b4'),
    (4, 'A', 'a2'),
    (4, 'D', 'd1');

For 1, pair would be A-B.
For 2, one wouldn't exist.
For 3, pair would be A-C.
For 4, pairs would be A-B, A-D, and B-D.

So the output would be:

| Pair | Cnt |
| A-B  | 2   |
| A-C  | 1   |
| A-D  | 1   | 
| B-D  | 1   |

Is this something we can do in SQL using something like GROUP_CONCAT?
I've been wrapping my head around this problem for days and still can't think of a simple solution.

Thanks!

score 1 · Accepted Answer · answered Nov 01 '20 at 12:17

I think this is a self-join and count distinct. One method is:

select t1.c2, t2.c2, count(distinct t1.id) as cnt
from t t1 join 
     t t2
     on t1.id = t2.id and t1.c2 < t2.c2
group by t1.c2, t2.c2
order by cnt desc;

Depending on your data, it might be more efficient to remove duplicates first and then join:

with tt as (
      select distinct t.id, t.c2
      from t
     )
select t1.c2, t2.c2, count(t1.id) as cnt
from tt t1 join 
     tt t2
     on t1.id = t2.id and t1.c2 < t2.c2
group by t1.c2, t2.c2
order by cnt desc;

Gosh, I'm realizing how this was rather simple and I was over-complicating the problem. Guess I didn't think about using < comparison for characters. Thanks for the help! — creativename, Nov 01 '20 at 12:25

GMB · Answer 2 · 2020-11-01T12:20:24.813

0

You can self-join and aggregate:

select t1.c2 c21, t2.c2 c22, count(distinct t1.id) cnt 
from mytable t1
inner join mytable t2
    on  t1.id = t2.id
    and t1.c2  < t2.c2
group by t1.c2, t2.c2

edited Nov 01 '20 at 12:20

answered Nov 01 '20 at 11:51

GMB

216,147
25
84
135

Hello! I changed your `a` to `c2` but I'm getting duplicates in my results... | AB | 3 || AC | 2 || AD | 2 || BD | 1 | – creativename Nov 01 '20 at 12:12

In SQL, create pairs based on a grouping and count their frequency

2 Answers2