3

Suppose I have labels with multiple stores associated with them like so:

label_id | store_id
--------------------
label_1  | store_1
label_1  | store_2
label_1  | store_3
label_2  | store_2
label_2  | store_3
label_3  | store_1
label_3  | store_2

Is there any good way in SQL (or jooq) to get all the store ids in the intersection of the labels? Meaning just return store_2 in the example above because store_2 is associated with label_1, label_2, and label_3? I would like a general method to handle the case where I have n labels.

GMB
  • 216,147
  • 25
  • 84
  • 135
Student
  • 287
  • 4
  • 9
  • "because store_2 is associated with label_1, label_2, and label_3". I'm lost. `store_1` has that association, not `store_2`. – Gordon Linoff Jul 25 '20 at 00:59
  • @GordonLinoff: I was looking for the intersection of label_1_stores = [store_1, store_2, store_3], label_2_stores = [store_2, store_3] and label_3_stores = [store_1, store_2]. The intersection of those 3 arrays is only store_2. – Student Jul 25 '20 at 22:31

4 Answers4

4

This is a relational division problem, where you want the stores that have all possible labels. Here is an approach using aggregation:

select store_id
from mytable
group by store_id
having count(*) = (select count(distinct label_id) from mytable)

Note that this assumes no duplicate (store_id, label_id) tuples. Otherwise, you need to change the having clause to:

having count(distinct label_id) = (select count(distinct label_id) from mytable)
GMB
  • 216,147
  • 25
  • 84
  • 135
  • Thanks for the quick response. What if I wanted provide an input of labels and find the intersection of all the stores associated with those labels? Is there a good way to modify the existing SQL you have now? – Student Jul 24 '20 at 22:07
3

Since you're also looking for a jOOQ solution, jOOQ supports a synthetic relational division operator, which produces a more academic approach to relational division, using relational algebra operators only:

// Using jOOQ
T t1 = T.as("t1");
T t2 = T.as("t2");

ctx.select()
   .from(t1.divideBy(t2).on(t1.LABEL_ID.eq(t2.LABEL_ID)).returning(t1.STORE_ID).as("t"))
   .fetch();

This produces something like the following query:

select t.store_id
from (
  select distinct dividend.store_id
  from t dividend
  where not exists (
    select 1
    from t t2
    where not exists (
      select 1
      from t t1
      where dividend.store_id = t1.store_id
      and t1.label_id = t2.label_id 
    )
  )
) t

In plain English:

Get me all the stores (dividend), for which there exists no label (t2) for which that store (dividend) has no entry (t1)

Or in other words

If there was a label (t2) that a store (dividend) does not have (t1), then that store (dividend) would not have all the available labels.

This isn't necessarily more readable or faster than GROUP BY / HAVING COUNT(*) based implementations of relational divisions (as seen in other answers), in fact, the GROUP BY / HAVING based solutions are probably preferrable here, especially since only one table is involved. A future version of jOOQ might use the GROUP BY / HAVING approach, instead: #10450

But in jOOQ, it might be quite convenient to write this way, and you asked for a jOOQ solution :)

Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
1

Then convert the query by @GMB into an SQL function that takes an array and returns a table of store_id's.

create or replace 
function stores_with_all_labels( label_list text[] )
 returns table (store_id text) 
 language  sql
as $$
    select store_id
      from label_store
     where label_id = any (label_list)
     group by store_id
    having count(*) = array_length(label_list,1);
$$;

Then all that's needed is a simple select. See complete example here.

Belayer
  • 13,578
  • 2
  • 11
  • 22
0

If there are three particular labels you want, you can use:

select store_id
from t
where label in (1, 2, 3)
group by store_id
having count(*) = 3;

If you want only those three labels and nothing else, then:

select store_id
from t
group by store_id
having count(*) = 3 and
       count(*) filter (where label in (1, 2, 3)) = count(*);
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786