optimise IN subquery

Question

I have a workers table and an associated workerGeofence table.

CREATE TABLE IF NOT EXISTS `workergeofences` (
`ID` int(11) NOT NULL,
  `WorkerID` varchar(20) NOT NULL,
  `GeofenceID` int(11) NOT NULL,
  `isActive` tinyint(4) NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=107 DEFAULT CHARSET=latin1;

I need to return only workers who have at least one entry in the workerGeofences table with an isActive of 1.

I'm able to get the desired outcome withe following:

    SELECT distinct w.ID, Title, FName, SName, Email, Birthday, Address, Phone, description,
 companyID 
FROM Workers w WHERE companyID = ? 
and w.ID IN (SELECT WorkerID FROM WorkerGeofences WHERE isActive <> 0)
    limit ?,10

but the in subquery is exhaustive as when I run the explain, I can see it is scanning the entire table. How would I get around this?

You can't show us only half the problem. – Strawberry Feb 03 '16 at 14:37 — Strawberry, Feb 03 '16 at 14:37

score 2 · Answer 1 · answered Feb 03 '16 at 14:37

2

You are on the right track, but you shouldn't need select distinct. This slows down queries, unless you know there are duplicates -- and that is unlikely because you are selecting WOrkers.Id.

SELECT w.* 
FROM Workers w 
WHERE w.companyID = ? AND
      EXISTS (SELECT 1
              FROM workerGeofences wg
              WHERE w.ID = wg.WorkerID AND wg.isActive <> 0
             )
LIMIT ?, 10;

Then, for this query, you want indexes on Workers(CompanyId, Id) and workerGeofences(WorkerId, isActive).

Note: I just put in select * for convenience. I assume all the columns are coming from the Workers table.

answered Feb 03 '16 at 14:37

Gordon Linoff

1,242,037
58
646
786

(This is better than the accepted answer because (1) it emphasizes `EXISTS`, not the slow `IN` and (2) it mentions indexes which may help.) – Rick James Feb 03 '16 at 22:20
@RickJames It is, I upvoted it but I had already accepted the other answer before this was posted. I was already using indexes thankfully and went for the exists option in the accepted answer – user2363025 Feb 04 '16 at 09:32
@GordonLinoff one question: is there any reason why you checked isActive <> 0 instead of isActive = 1? I thought checking <> is more expensive on the query? – user2363025 Feb 04 '16 at 09:43
The Optimizer has no way of knowing how many different values `isActive` can take on. You started with `<> 0`, Gordon copied you. However it is potentially inefficient -- it involves two ranges "< 0" and "> 0". `=1` is more efficient because it is only one range, "= 1". So Sagi's 3rd example has that improvement. – Rick James Feb 04 '16 at 16:44
Actually, the inequality shouldn't really make a difference. It is the second key in the index, so an inequality will still use the index (an equality would be slightly better, but that is a micro-optimization). – Gordon Linoff Feb 05 '16 at 03:09

score 1 · Accepted Answer · edited Feb 03 '16 at 22:17

First of all , your join is wrong! you are not comparing any common column on both table, you should add where workerGeofences.workerID = w.id like this:

SELECT  w.ID, Title, FName, SName, Email, Birthday, Address, Phone,
        description, companyID
    FROM  Workers w
    join  workerGeofences
    WHERE  workerGeofences.workerID = w.ID companyID = ?
      and  w.ID IN (
        SELECT  WorkerID
            FROM  WorkerGeofences s
            WHERE  isActive <> 0
              and  s.workerID = w.id
                   )
    limit  0,10

And second, you are not selecting anything from the second table, so the join is unessesary and in your IN statement, you are not comparing the right ID's so your query should be:

SELECT  w.ID, Title, FName, SName, Email, Birthday, Address, Phone,
        description, companyID
    FROM  Workers w
    WHERE  companyID = ?
      and  w.ID IN (
        SELECT  WorkerID
            FROM  WorkerGeofences s
            WHERE  isActive <> 0
              and  s.workerID = w.ID
                   )
    limit  0,10

Also, you can use EXISTS() for that.

SELECT  w.ID, Title, FName, SName, Email, Birthday, Address, Phone,
        description, companyID
    FROM  Workers w
    WHERE  companyID = ?
      and  exists 
      ( SELECT  1
            FROM  WorkerGeofences s
            WHERE  isActive = 1
              and  s.workerID = w.ID
      )
    limit  0,10

I used exists, and I cant drop the distinct since we dont know if there are duplicates in his table @Arth — sagi, Feb 03 '16 at 14:34
Assuming each worker row is unique in workers (fairly safe assumption) you can drop the DISTINCT. — Arth, Feb 03 '16 at 14:36

score 1 · Answer 3 · answered Feb 03 '16 at 14:40

1

For completeness using JOIN:

SELECT DISTINCT w.ID,
       w.Title, 
       w.FName, 
       w.SName, 
       w.Email, 
       w.Birthday, 
       w.Address, 
       w.Phone, 
       w.description, 
       w.companyID
  FROM Workers w 
  JOIN WorkerGeofences wg
    ON wg.workerID = w.id
   AND wg.isActive = 1
 WHERE w.companyID = ? 
 LIMIT ?,10

answered Feb 03 '16 at 14:40

Arth

12,789
5
37
69

If there is more than one `isActive` row for a worker, it will generate duplicate rows. Then the `DISTINCT` kicks in to remove the dups -- thereby making it less efficient than the `EXISTS` solution. – Rick James Feb 03 '16 at 22:23
@RickJames Yep, you are probably right.. although I think the optimiser may result in the same query. I included it more for completeness, but the comment will be useful to travelling eyes, – Arth Feb 04 '16 at 09:22

optimise IN subquery

3 Answers3