0

I have table like below:

|Group|User|
|  1  | X  | 
|  1  | Y  |
|  1  | Z  |
|  2  | X  |
|  2  | Y  |
|  2  | Z  |
|  3  | X  |
|  3  | Z  |
|  4  | X  |

I want to calculate intersections of groups: 1&2, 1&2&3, 1&2&3&4. Then I want to see users in each intersection and how many of them are there. In the given example should be:

1&2     -> 3
1&2&3   -> 2
1&2&3&4 -> 1

Is it possible using SQL? If yes, how can I start?

On a daily basis I'm working with Python, but now I have to translate this code into SQL code and I have huge problem with it.

Joe
  • 69
  • 1
  • 8

1 Answers1

1

In PostgreSQL you can use JOIN or INTERSECT. Example with JOIN:

select count(*)
from t a
join t b on b.usr = a.usr
where a.grp = 1 and b.grp = 2

Then:

select count(*)
from t a
join t b on b.usr = a.usr
join t c on c.usr = a.usr
where a.grp = 1 and b.grp = 2 and c.grp = 3

And:

select count(*)
from t a
join t b on b.usr = a.usr
join t c on c.usr = a.usr
join t d on d.usr = a.usr
where a.grp = 1 and b.grp = 2 and c.grp = 3 and d.grp = 4

Result in 3, 2, and 1 respectively.

EDIT - You can also do:

select count(*)
from (
  select usr, count(*) 
  from t
  where grp in (1, 2, 3, 4)
  group by usr
  having count(*) = 4 -- number of groups
) x

See running example at db<>fiddle.

The Impaler
  • 45,731
  • 9
  • 39
  • 76
  • What If I have more groups? Like 15-20. Can be there any other way to solve it? – Joe May 21 '22 at 19:41
  • Great! It looks good! Can I add loop or sth, so I don't need to write this 15-20 times? Cuz I also have to modify tuple with groups: where grp in (1, 2, 3) and It'll be frustrating. – Joe May 21 '22 at 20:03
  • 1
    @Joe , you can create a parameterized query passing it the list of groups in a form of array or csv string. If you have a problem creating a query like that you'd better ask another question. – Serg May 22 '22 at 07:32
  • It's something wrong with this approach. I made it twice in 2 different ways: - length(arrayIntersect(groupUniqArray, )) in CH and - using pandas in Python. Both ways had same results, but different than yours. – Joe May 22 '22 at 11:05
  • Ok. I got it. I had to drop duplicates or change the condition from having count(*) = 4 to having count(*) >= 4. – Joe May 22 '22 at 11:52