1

I'd like someone who can explain me the logic difference between these two queries. Maybe you can explain performance difference also. (DB is Microsoft Northwind).

-- Join

select distinct c.CustomerID, c.CompanyName, c.ContactName  from orders as o inner join customers as c
on o.CustomerID = c.CustomerID

-- SubQuery

select customerid, companyname, contactname, country from customers
where customerid in (select distinct customerid from orders)

Thanks in advance.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
B. Carlos
  • 11
  • 2

1 Answers1

3

The first generates an intermediate result set with all orders for all customers. It then reduces them using select distinct.

The second just selects the customers without having to reduce them later. It should be much more efficient. However, the select distinct is not needed in the subquery (it is done automatically with in).

I would write the logic as:

select c.customerid, c.companyname, c.contactname, c.country
from customers c
where exists (select 1
              from orders o
              where o.customerid = c.customerid 
             );

This can readily make use of an index on orders(customerid).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks a lot. Now since I'm a beginner I have doubts about your query my friend. May you explain me the "exists" and "select 1" (true value return?) role in your query? – B. Carlos Aug 13 '17 at 02:23
  • I wasn't aware of the sub query distinct, thanks for the tip! – Mark Kram Aug 13 '17 at 02:35
  • Holy cow...no. The latter query is not really an improvement and the ON operator does the same filtering and is in fact much better. For one thing, the Predicate must evaluate as true for every row returned. This means that select query is actually fired lots of times. – clifton_h Aug 13 '17 at 06:49
  • @clifton_h . . . Time to learn how databases work and what a semi-join operator is. – Gordon Linoff Aug 13 '17 at 16:30
  • @Gordon.Linoff sadly, you seem not to understand the difference between an outer apply operator and a set based join. It can...be better depending on the covering indexes...they are not the same. – clifton_h Aug 13 '17 at 16:51
  • @clifton_h . . . I'm pretty sure SQL Server uses a left semi join with `exists`. That would seem to be pretty optimal under most circumstances. – Gordon Linoff Aug 13 '17 at 18:14