2

Possible Duplicate:
Mysql: Perform of NOT EXISTS. Is it possible to improve permofance?

Is there a better/optimal way to do it. Should I use exists instead of join? Or two separate queries? And what about temporary tables, as I was reading about those but uncertain.

Getting members email from a group. Checking that they have not received a item yet.

SELECT m.email,g.id 
FROM group g 
  LEFT JOIN  members m 
    ON  g.mid = m.id 
    AND g.gid='1' 
WHERE NOT EXISTS 
      ( SELECT id 
        FROM items AS i 
        WHERE i.mid=m.id 
        AND i.item_id='5'
      ) 
Community
  • 1
  • 1
timeout
  • 35
  • 5

2 Answers2

4

Here's the same thing written as a JOIN:

SELECT m.email, g.id
From members m
JOIN group g ON g.mid = m.id AND g.gid = '1' 
LEFT JOIN items i ON i.mid = m.id AND i.item_id = '5'
WHERE i.id IS NULL

Use the following compound indexes:

group (mid, gid)
items (mid, item_id)

I reversed the LEFT JOIN on members and group because it seems like you're returning members, not groups, and I changed the LEFT JOIN into an INNER JOIN since you only want members from that group.

I think this one might read better:

SELECT m.email, g.id
From members m
JOIN group g ON g.mid = m.id
LEFT JOIN items i ON i.mid = m.id AND i.item_id = 5
WHERE g.gid = 1
  AND i.id IS NULL

You might be wondering if we can move the i.item_id = 5 part to the WHERE clause also. You can't because there are no rows where i.id IS NULL and i.item_id = 5. You must do the join first and then eliminate the NULL rows in the WHERE clause.

Marcus Adams
  • 53,009
  • 9
  • 91
  • 143
  • yes, you are right the join should be other way round lol. question regarding 2 joins opposed to 1 join and exists, is that the way to go performance wise. i ask because this query will be running in a cron job. – timeout Jun 26 '12 at 17:46
  • @timeout, NOT EXISTS [is slower than JOIN](http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/), but the most important part is the indexes. – Marcus Adams Jun 26 '12 at 17:59
  • interesting read, thanks – timeout Jun 26 '12 at 18:30
  • I don't believe the second query (with the predicate on `i.item_id = 5` moved to the `WHERE` clause will return the specified result set. That predicate needs to be in the `ON` clause. (I make a note of this here for future readers, because this was the "selected" answer.) – spencer7593 Jun 26 '12 at 18:47
  • @spencer7593 - yours and Marcus return same results. :) ty – timeout Jun 26 '12 at 20:41
  • @spencer7593, I'm pretty sure the results will be the same. If you can explain why they won't be, or can verify, please do. – Marcus Adams Jun 26 '12 at 20:51
  • @Marcus: because it's an anti-join. The LEFT JOIN and the predicate on i.id IS NULL, there will not be any rows that satisfy the `i.item_id = 5` predicate. Test case:::: CREATE TABLE `group` (id INT PRIMARY KEY, gid INT, `mid` INT); CREATE TABLE items (id INT PRIMARY KEY, item_id INT, `mid` INT); CREATE TABLE members (id INT PRIMARY KEY, email VARCHAR(40)); INSERT INTO test.group VALUES (1,1,1); INSERT INTO test.group VALUES (2,1,2); INSERT INTO items VALUES (1,5,1); INSERT INTO members VALUES (1,'one@m.com'),(2,'two@m.com'); – spencer7593 Jun 26 '12 at 21:08
  • @timeout: I was addressing the SECOND query in the answer, for future readers. It returns a different result than the FIRST query in the answer, because the predicate on i.item_id is moved from the ON clause to the WHERE clause. That impacts the anti-join. – spencer7593 Jun 26 '12 at 21:15
  • @spencer7593, thank you for taking the time to school me. I stared at that for 15 minutes before I could see the error. I fixed the answer. – Marcus Adams Jun 27 '12 at 13:47
1

I don't believe a temporary table is necessary. We'd really only go that route if we can't get acceptable performance.

From your query, we gather your schema looks like this:

group (id INT PK, gid INT, mid INT)
items (id INT PK, item_id INT, mid INT)
members (id INT PK, email VARCHAR)

It looks like your group table is really a "membership" table, which resolves/implements a many-to-many relationship between a group and a person. (That is, a person can be a member of zero, one or more groups; a group can have zero, or or more persons as members.)

You are using a LEFT JOIN between group and members. This will return a row for group (returning group.id) when there are no matching members, with a NULL for members.email (which may be what you want). But if you only want to return email addresses, then this can be changed to an INNER JOIN.

The NOT EXISTS predicate can be replaced with an OUTER JOIN and a test for a NULL value returned from the JOINED table. If the group.gid and/or items.item_id columns are numeric datatype, then you can remove the quotes from around the integer literals in the predicates.

Here is an alternative which will return an equivalent resultset, and may perform better:

SELECT m.email
     , g.id 
  FROM members m
  JOIN group g ON g.mid = m.id AND g.gid = 1
  LEFT
  JOIN items i ON i.mid = m.id AND i.item_id = 5
 WHERE i.id IS NULL

ADDENDUM:

TEST CASE (provided in comment on selected answer) demonstrates difference in result set between queries with the predicate items.item_id = 5 in the ON clause and in the WHERE clause. (Moving this predicate to the WHERE clause messes with the anti-join.)

CREATE TABLE `group` (`id` INT PRIMARY KEY, `gid` INT, `mid` INT);
CREATE TABLE `items` (`id` INT PRIMARY KEY, `item_id` INT, `mid` INT);
CREATE TABLE `members` (`id` INT PRIMARY KEY, `email` VARCHAR(40));

INSERT INTO `group` VALUES (1,1,1), (2,1,2);
INSERT INTO `items` VALUES (1,5,1);
INSERT INTO `members` VALUES (1,'one@m.com'),(2,'two@m.com'); 
spencer7593
  • 106,611
  • 15
  • 112
  • 140