0

is there a better way to write this SQL than using WHERE ... IN (subquery)?

  SELECT device.mac, reseller.name, agent.name
  FROM device
  LEFT JOIN global_user
    ON device.global_user_id = global_user.id
  LEFT JOIN agent
    ON global_user.id = agent.global_user_id
  LEFT JOIN reseller
    ON global_user.id = reseller.global_user_id
        OR agent.reseller_id = reseller.id
  WHERE device.global_user_id IN (
    SELECT global_user_id
        FROM reseller
        WHERE id = '200'
  ) OR device.global_user_id IN (
    SELECT global_user_id
        FROM agent
        WHERE reseller_id = '200'
  );

im trying to get a list of all of the devices, with some reseller/agent details, under a particular reseller. this would include devices assigned directly to the reseller and devices assigned to agents under the reseller. reseller.id is unique. it will be executed on postgresql database.

devices are assigned to both agents and resellers. agents are assigned to resellers.

this query works, but i haven't often used an OR in a JOIN and i usually try to avoid subqueries. this query concept will be used often, so i'd like to make sure i haven't overlooked something.

thank you for any feedback.

  • the question, as it appears in the title, is "is there a better way to write this SQL than using WHERE ... IN (subquery)?" i've added it to the body for clarification. –  Feb 05 '10 at 20:43
  • @NickLarsen yes, reseller.id is unique. –  Feb 05 '10 at 20:43

3 Answers3

3

You could give this a whirl:

 SELECT device.mac, reseller.name, agent.name
  FROM device
  JOIN
  (
      SELECT global_user_id
      FROM reseller
      WHERE id = '200'
      UNION
      SELECT global_user_id
      FROM agent
      WHERE reseller_id = '200'
  ) r ON device.global_user_id = r.global_user_id
  LEFT JOIN global_user
    ON device.global_user_id = global_user.id
  LEFT JOIN agent
    ON global_user.id = agent.global_user_id
  LEFT JOIN reseller
    ON global_user.id = reseller.global_user_id
        OR agent.reseller_id = reseller.id

Clarification: Always a good idea to try out different variations of a query to make sure you end up with the best performing query (although often, different variations result in the same execution plan being generated by the query optimiser). Speaking from SQL Server point of view, the order in which the query is processed in means the JOINs are processed first before the WHERE clause. So in theory, this JOIN approach should scale the resultset down earlier.

AdaTheDev
  • 142,592
  • 28
  • 206
  • 200
  • For clarification, why would you have him use this instead of the original query? – Nick Larsen Feb 05 '10 at 20:36
  • @NickLarsen - clarification added – AdaTheDev Feb 05 '10 at 20:42
  • @AdaTheDev You make a great point about the join, for a little more reading to anyone who visits this thread, check out this article on order of operations for sql server queries, http://www.bennadel.com/blog/70-SQL-Query-Order-of-Operations.htm – Nick Larsen Feb 05 '10 at 21:06
  • @AdaTheDev this query does produce the correct result set. thank you for your clarification and comments on order of operations. one of the other answers eliminates the use of the subqueries altogether: http://stackoverflow.com/questions/2210204/is-there-a-better-way-to-write-this-sql-than-using-where-in-subquery/2210314#2210314 –  Feb 05 '10 at 21:18
  • @AdaTheDev against a larger data set, my original query was taking about 2.5 seconds to return. this query seems to average about a second quicker. the solution i linked above, that also returns a correct reseult, takes about 40 seconds to run on the same larger dataset. –  Feb 05 '10 at 21:57
1

How about this?

SELECT d.mac, r.name, a.name
FROM device as d, global_user as g, agent as a, reseller as r
WHERE d.global_user_id = g.id
  AND g.id = a.global_user_id
  AND (g.id = r.global_user_id OR a.reseller_id = r.id)
  AND (r.id = '200' OR a.reseller_id = '200');
Wade
  • 91
  • 3
  • Because reseller.id is unique from comments original poster comments. I would still go with the ON clauses, but the point was you can remove the subqueries all together. – Nick Larsen Feb 05 '10 at 20:48
  • Certain database flavors would be able to optimize that, but it would probably destroy performance on most servers. That said, it is succinct. – eswald Feb 05 '10 at 20:50
  • @Wade this won't show devices that are assigned directly to the reseller. this will only show the devices assigned to an agent. this will produce an incorrect result set. –  Feb 05 '10 at 20:58
1

I try to avoid sub-queries and IN clause if it is easy to replace them. If I understood you DB model correctly, this query should produce the same result:

SELECT      DISTINCT
            device.mac, reseller.name, agent.name
FROM        device
LEFT JOIN   global_user
        ON  device.global_user_id = global_user.id
LEFT JOIN   agent
        ON  global_user.id = agent.global_user_id
LEFT JOIN   reseller
        ON  global_user.id = reseller.global_user_id
        OR  agent.reseller_id = reseller.id
WHERE       reseller.id = '200'
        OR  agent.reseller_id = '200'
van
  • 74,297
  • 13
  • 168
  • 171
  • @van before i started using the subqueries, i almost had your suggestion in use, but i was slightly off. this does produce the correct result set, without the use of subqueries. –  Feb 05 '10 at 21:10
  • @van when running this query against a larger data set, it takes 40 seconds to return a result as opposed to my original query that takes 2.5 seconds. –  Feb 05 '10 at 21:55
  • @dan2342: very well maybe that this query is not optimal. Ideally you could split it into two queries navigating 2 of the possible paths from `device` to `reseller`. Also, if the query with sub-select first makes such a great performance, I can assume that you might be missing some `indices` on the database which might be key for the performance of your query. – van Feb 09 '10 at 13:29