16

Which is more efficient?

SELECT  theField
FROM    theTable
GROUP BY theField

or

SELECT  DISTINCT theField
FROM    theTable
Cade Roux
  • 88,164
  • 40
  • 182
  • 265
wcm
  • 9,045
  • 7
  • 39
  • 64

6 Answers6

36

In your example, both queries will generate the same execution plan so their performance will be the same.

However, they both have their own purpose. To make your code easier to understand, you should use distinct to eliminate duplicate rows and group by to apply aggregate operators (sum, count, max, ...).

dub
  • 1,396
  • 12
  • 22
  • 7
    I would add that adding distinct willy nilly to remove duplicate rows can be bad idea especially if it's use disguises a cross join or missing items in a where clause. Distinct would be my last choice to remove rows. – HLGEM Dec 17 '08 at 16:55
  • @HLGEM, when would you use Distinct? – wcm Dec 17 '08 at 17:32
  • @wcm, Imagine a Order table that has a Product field which contains the type of product ordered. You want to know what are all the different products that have been ordered. A simple select will return duplicates if the same product has been ordered multiple times. Use select distinct instead. – dub Dec 17 '08 at 17:42
  • @dub, I was asking @HGLEM under what conditions he would use DISTINCT since he is saying that it would be his last choice to remove duplicate rows when it is the only time I even consider it. – wcm Dec 17 '08 at 22:44
  • 1
    @wcm I understand HLGem's point. Consider a query to return all products purchased by a specific set of customers. INNER JOIN products to sales to customers gives duplicate rows, which could be eliminated by DISTINCT. Using WHERE EXISTS to prevent row duplication should reduce server workload. – Meff Dec 21 '08 at 12:42
11

Doesn't matter, it results in the same execution plan. (at least for these queries). These kind of questions are easy to solve, by enabling query analyzer or SSMS to show the execution plan and perhaps the server trace statistics after running the query.

Frans Bouma
  • 8,259
  • 1
  • 27
  • 28
6

In most cases, DISTINCT and GROUP BY generate the same plans, and their performance is usually identical

Greg Dean
  • 29,221
  • 14
  • 67
  • 78
5

You can check the Execution Plan to look for the total cost of this statements. The answer may vary in different scenarios.

Ken Yao
  • 1,506
  • 1
  • 13
  • 24
2

Hmmm...so far as I can see in the Execution Plan for running similar queries, they are identical.

Dana
  • 32,083
  • 17
  • 62
  • 73
0

In MySQL, DISTINCT seems a bit faster than GROUP BY if theField is not indexed. DISTINCT only eliminate duplicate rows but GROUP BY seems to sort them in addition.

auntyellow
  • 2,423
  • 2
  • 20
  • 47