3

I have a list of product IDs and I want to find out which orders contain all those products. Orders table is structured like this:

order_id | product_id
----------------------
1        | 222
1        | 555
2        | 333

Obviously I can do it with some looping in PHP but I was wondering if there is an elegant way to do it purely in mysql. My ideal fantasy query would be something like:

SELECT order_id
FROM orders
WHERE (222,555) IN GROUP_CONCAT(product_id)
GROUP BY order_id

Is there any hope or should I go read Tolkien? :) Also, out of curiosity, if not possible in mysql, is there any other database that has this functionality?

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
Amati
  • 1,484
  • 1
  • 16
  • 29
  • This is stringly-typed programming: `(222,555) IN GROUP_CONCAT(product_id)`, and generally discouraged. Use proper set-based approach – Michael Buen Sep 29 '12 at 14:48
  • Discouraged? I didn't think this would even execute, let alone produce any meaningful results. I believe I made it pretty clear it's a 'fantasy' query, only provided to give a better idea of what I'm looking for. – Amati Sep 29 '12 at 15:24
  • Yep I know it's not even a valid construct. I get the idea, I just thought you really wanted to use group_concat ;-) – Michael Buen Sep 29 '12 at 15:31
  • Oh, I definitely wanted to use group_concat :) – Amati Sep 29 '12 at 15:47
  • That's what I meant by stringly-typed programming :-) avoid processing things through string approach(e.g. `group_concat`) especially when a set-based proper approach is available. Stringly-typed programming: http://stackoverflow.com/a/11846552 Set-based proper: http://stackoverflow.com/a/11846457 – Michael Buen Sep 29 '12 at 15:55
  • RE `..if not possible in mysql, is there any other database that has this functionality?` You can sort of do the same thing with Postgresql, you can use its contain operator `@>`, and you don't have to track the quantity of matching result, i.e. the 2 in `having ... = 2` http://www.sqlfiddle.com/#!1/4efca/4 Related to: http://stackoverflow.com/questions/12433260/postgresql-query-on-text-array-value/12434149#12434149 – Michael Buen Sep 29 '12 at 16:06

4 Answers4

5

You were close

SELECT order_id
FROM orders
WHERE product_id in (222,555) 
GROUP BY order_id
HAVING COUNT(DISTINCT product_id) = 2

Regarding your "out of curiosity" question in relational algebra this is achieved simply with division. AFAIK no RDBMS has implemented any extension that makes this as simple in SQL.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
1

I have a preference for doing set comparisons only in the having clause:

select order_id
from orders
group by order_id
having sum(case when product_id = 222 then 1 else 0 end) > 0 and
       sum(case when product_id = 555 then 1 else 0 end) > 0

What this is saying is: get me all orders where the order has at least one product 222 and at least one product 555.

I prefer this for two reasons. The first is generalizability. You can arrange more complicated conditions, such as 222 or 555 (just by changing the "and" to and "or"). Or, 333 and 555 or 222 without 555.

Second, when you create the query, you only have to put the condition in one place, in the having clause.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

Assuming your database is properly normalized, i.e. there's no duplicate Product on a given Order

Mysqlism:

select order_id
from orders
group by order_id
having sum(product_id in (222,555)) = 2

Standard SQL:

select order_id
from orders
group by order_id
having sum(case when product_id in (222,555) then 1 end) = 2

If it has duplicates:

CREATE TABLE tbl
    (`order_id` int, `product_id` int)
;

INSERT INTO tbl
    (`order_id`, `product_id`)
VALUES
    (1, 222),
    (1, 555),
    (2, 333),
    (1, 555)
;

Do this then:

select order_id
from tbl
group by order_id
having count(distinct case when product_id in (222,555) then product_id end) = 2

Live test: http://www.sqlfiddle.com/#!2/fa1ad/5

Michael Buen
  • 38,643
  • 9
  • 94
  • 118
1
CREATE TABLE orders
        ( order_id INTEGER NOT NULL
        , product_id INTEGER NOT NULL
        );
INSERT INTO orders(order_id,product_id) VALUES
 (1, 222 ) , (1, 555 ) , (2, 333 )
, (3, 222 ) , (3, 555 ) , (3, 333 ); -- order#3 has all the products

CREATE TABLE products AS (SELECT DISTINCT product_id FROM orders);

SELECT *
FROM orders o1
   --
   -- There should not exist a product
   -- that is not part of our order.
   --
WHERE NOT EXISTS (
        SELECT *
        FROM products pr
        WHERE 1=1
           -- extra clause: only want producs from a literal list
        AND pr.product_id IN (222,555,333)
           --  ... that is not part of our order...
        AND NOT EXISTS ( SELECT *
                FROM orders o2
                WHERE o2.product_id = pr.product_id
                AND o2.order_id = o1.order_id
                )
        );

Result:

 order_id | product_id 
----------+------------
        3 |        222
        3 |        555
        3 |        333
(3 rows)
wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • I believe you misunderstood my question. Please, see the other answers. – Amati Sep 29 '12 at 15:28
  • This lists the orders that contain all the products. If that is not what you want you should rephrase your question, IMHO. (and I don't think I should need to read the answers in order to find out what the question means) – wildplasser Sep 29 '12 at 15:29
  • I have a *list* of products. That list may change every time I run the query. What your query seems to be doing is to check the orders table against the *full* products table. You certainly don't need to read anything. It's just that you spent time answering this and thought I'd point out why it's incorrect. Thank you anyway. – Amati Sep 29 '12 at 15:46
  • Databases don't have lists. That's why I put the list of producs into a *table* of products. You did not provide a table, so I made one myself (using all the existing producs in the orders table). – wildplasser Sep 29 '12 at 15:50
  • OK, sorry if I acted too quickly yesterday. I'll upvote you for what seems to be a correct answer and forcing me to read about (NOT) EXISTS but that still looks way too convoluted and hard to read compared to the other suggestions. – Amati Sep 30 '12 at 17:01
  • Sorry too. The OQ was not too clear, IMHO. The query actually works, even in the presence of duplicate product_id's per client_id. And yes: the double "NOT EXISTS" is spooky, but it expresses exactly what you intended (IMHO) – wildplasser Sep 30 '12 at 17:10