2

I have tables:

Table Site

╔════╦═══════════════╗
║ ID ║     NAME      ║
╠════╬═══════════════╣
║  1 ║ stackoverflow ║
║  2 ║ google.com    ║
║  3 ║ yahoo.com     ║
║  4 ║ cnn.com       ║
╚════╩═══════════════╝

Table Widget

╔════╦════════════╗
║ ID ║    NAME    ║
╠════╬════════════╣
║  1 ║ polling    ║
║  2 ║ comments   ║
║  3 ║ newsletter ║
║  4 ║ mail       ║
╚════╩════════════╝

Table SiteWidget

╔═════════╦═══════════╗
║ SITE_ID ║ WIDGET_ID ║
╠═════════╬═══════════╣
║       1 ║         1 ║
║       1 ║         2 ║
║       2 ║         2 ║
║       2 ║         3 ║
║       4 ║         2 ║
║       3 ║         1 ║
║       3 ║         3 ║
║       1 ║         4 ║
║       3 ║         4 ║
║       4 ║         1 ║
║       4 ║         4 ║
╚═════════╩═══════════╝

I would like get all sites with comments (2) and mail (4).

I try:

SELECT * FROM Site 
LEFT JOIN SiteWidget ON Site.id = SiteWidget.site_id 
WHERE SiteWidget.widget_id IN (2, 4) 

but this return me stackoverflow (2, 4 - OK), google.com (2 - NOT OK - without 4), yahoo.com (4 - NOT OK, without 2) and cnn.com (2, 4 - OK). How can i get all sites with 2 and 4? Always together, not singly.

John Woo
  • 258,903
  • 69
  • 498
  • 492
Al Steche
  • 23
  • 2
  • 4
    Why is your question tagged with `c#`, `java` and `php` at the same time? Could you narrow down the programming language you are using? – Darin Dimitrov Feb 03 '13 at 10:16

7 Answers7

4

Here's one way to do it - use an extra join so that you can look for combinations of 2 widgets:

SELECT * FROM Site s
INNER JOIN SiteWidget w1 ON (s.id = w1.site_id)
INNER JOIN SiteWidget w2 ON (s.id = w2.site_id)
WHERE w1.widget_id=2 and w2.widget_id=4;
Paul Dixon
  • 295,876
  • 54
  • 310
  • 348
  • this is good when the OP is always searching for two values only. What if he wants to search for 3 or 4 values? You need to add another join right? and that I think the query is not flexible and requires another resource due to joins. – John Woo Feb 03 '13 at 10:35
  • 3
    That's correct - each additional value needs a join. Only benchmarking would tell you if it consumes more resources than using HAVING, but it should exhibit good index usage. Not difficult to generate such a query programatically for multiple values either (and you could move the WHERE clauses into the JOIN clauses to make it even easier to generate) – Paul Dixon Feb 03 '13 at 10:39
  • Anyway I agree on what you said `:D` – John Woo Feb 03 '13 at 10:41
  • 1
    @PaulDixon, for what it's worth, I ran such benchmarks in a similar situation some time ago, and you are correct - the formula *does* exhibit very good index usage (MySQL 5.5). I expect it does on most platforms. I too find it easier on the clipboard :-) – LSerni Feb 03 '13 at 10:54
3

This problem is called Relational Division.

SELECT  a.Name
FROM    Site a
        INNER JOIN SiteWidget b
            ON a.ID = b.Site_ID
        INNER JOIN Widget c
            ON b.Widget_ID = c.ID
WHERE   c.Name IN ('comments','mail')
GROUP   BY a.Name
HAVING  COUNT(*) = 2

if uniqueness was not enforce on widget_id for every site_id, DISTINCT keyword is needed.

SELECT  a.Name
FROM    Site a
        INNER JOIN SiteWidget b
            ON a.ID = b.Site_ID
        INNER JOIN Widget c
            ON b.Widget_ID = c.ID
WHERE   c.Name IN ('comments','mail')
GROUP   BY a.Name
HAVING  COUNT(DISTINCT c.Name) = 2

Other Link

John Woo
  • 258,903
  • 69
  • 498
  • 492
2

Try:

SELECT * FROM Site
INNER JOIN SiteWidget SW1
    ON SW1.widget_id = 2 
    AND Site.id = SW1.site_id
INNER JOIN SiteWidget SW2
    ON SW2.widget_id = 4
    AND Site.id = SW2.site_id
RMalke
  • 4,048
  • 29
  • 42
1

You can use this if you want filter by widget name

SELECT
    S.id,
    S.name 
FROM Site S
    JOIN SiteWidget SW
        ON S.id = SW.site_id
    JOIN Widget W
        ON SW.widget_id = W.id
WHERE W.name IN ('comments', 'mail')
GROUP BY S.Id,S.name
HAVING COUNT(DISTINCT W.name) = 2

or if you want to filter by widget id

SELECT
    S.id,
    S.name 
FROM Site S
    JOIN SiteWidget SW
        ON S.id = SW.site_id
WHERE SW.widget_id IN (2, 4)
GROUP BY S.Id,S.name
HAVING COUNT(DISTINCT SW.widget_id) = 2
Hamlet Hakobyan
  • 32,965
  • 6
  • 52
  • 68
1

Got to join twice

SELECT * FROM Site 
inner JOIN SiteWidget m ON Site.id = m.site_id and m.widget_id = 4
inner Join SiteWidget c ON Site.id = c.site_id and c.widget_id = 2
Tony Hopkinson
  • 20,172
  • 3
  • 31
  • 39
1

Here is another way, Fiddle (Thanks @JW. for the fiddle tables & data)

select s.id, s.name
from site s join (
   select sw.site_id, count(w.id) cnt
   from SiteWidget sw join widget w on sw.widget_id = w.id 
   where w.id in (2,4) 
   group by sw.site_id
) T on s.id = T.site_id and T.cnt = 2
Kaf
  • 33,101
  • 7
  • 58
  • 78
1

Literally, you would need two different JOINs:

SELECT * FROM Site
    JOIN SiteWidget AS mail     ON (Site.id = mail.site_id AND mail.widget_id = 4)
    JOIN SiteWidget AS comments ON (Site.id = comments.site_id AND comments.widget_id = 2);

If you are sure that the SiteWidget table has no duplicates, e.g. because (site_id, widget_id) is primary key as is usually done for MtM relations, then you can also use HAVING: this is MySQL syntax:

SELECT Site.* FROM Site
    JOIN SiteWidget ON (SiteWidget.site_id = Site.id AND widget_id IN (2,4))
    GROUP BY Site.id HAVING COUNT(*) = 2;

since, due to uniqueness, the only possibility for a site to appear twice is to have both widgets. Some believe this to be an abuse of GROUP BY, and some SQLs (PostgreSQL, if I remember correctly) will require Site's fields to appear in GROUP BY or an aggregate functions in SELECT even if they functionally depend from the group-by column Site.id.

I find the first formula to be clearer and safer, and, I expect, more or less as fast as the second.

This is both because the many-to-many join table is very small (and index-covered to boot), and because this kind of operation was standard from day one, and is one of the most optimized. For example, I expect checks for widget_id 2 and 4 to run in parallel with a single logical read of the SiteWidget table in the join buffer. Even if they didn't, they would likely be loaded in parallel with a single physical read, the other hitting a SQL cache or, at the very least, the IOSS cache.

You might also try this slight variation, which should be the faster:

SELECT Site.* FROM Site
    JOIN SiteWidget AS mail     ON (Site.id = mail.site_id AND mail.widget_id = 4)
    JOIN SiteWidget AS comments ON (mail.site_id = comments.site_id AND comments.widget_id = 2);

which ought to run the main JOIN against the smallest SiteWidget table, and come out with id lookups into Site. This is actually what is likely to get done even if you word the query as in the first instance.

The first formula is perhaps easier to extend by copying and pasting if you ever need to add, say, the polling widget.

LSerni
  • 55,617
  • 10
  • 65
  • 107