Should I avoid IN() because slower than EXISTS()

Question

Possible Duplicate:
SQL Server IN vs. EXISTS Performance

Should I avoid IN() because slower than EXISTS()?

SELECT * FROM TABLE1 t1 WHERE EXISTS (SELECT 1 FROM TABLE2 t2 WHERE t1.ID = t2.ID)

VS

SELECT * FROM TABLE1 t1 WHERE t1.ID IN(SELECT t2.ID FROM TABLE2 t2)

From my investigation, I set SHOWPLAN_ALL. I get the same execution plan and estimation cost. The index(pk) is used, seek on both query. No difference.

What are other scenarios or other cases to make big difference result from both query? Is optimizer so optimization for me to get same execution plan?

"What are other scenarios or other cases to make big difference result from both query" - the execution plan depends on a lot of things so you always need to check it — zerkms, Mar 13 '12 at 10:06

Bohemian · Answer 1 · 2012-03-13T10:15:52.923

3

Do neither. Do this:

SELECT DISTINCT T1.* 
FROM TABLE1 t1
JOIN TABLE2 t2 ON t1.ID = t2.ID;

This will out perform anything else by orders of magnitude.

edited Mar 13 '12 at 10:15

answered Mar 13 '12 at 10:10

Bohemian

412,405
93
575
722

1

If there is a 1..M relation between `TABLE1` and `TABLE2` this will give you a different result. – Mikael Eriksson Mar 13 '12 at 10:12
@MikaelEriksson You're right. I added `DISTINCT`. Thanks. – Bohemian Mar 13 '12 at 10:16
@Bohemian Thank you for your answer. If you kind enough to point me that When i should use IN(). Or should i avoid IN() as much as posible? – Pongsathon.keng Mar 13 '12 at 10:19
1

@Pongsathon.keng: Bohemian's approach performs better on MySQL, but not on SQLServer - see Diego's answer. In general, it is better to use EXISTS rather than IN, except where you have a list of values rather than a table - eg: `SELECT * FROM TABLE1 WHERE ID IN (1, 7, 16);` – Mar 13 '12 at 10:54

score 2 · Answer 2 · edited Mar 13 '12 at 10:37

2

Both queries will produce the same execution plan (assuming no indexes were created): two table scans and one nested loop (join).

The join, suggested by Bohemian, will do a Hash Match instead of the loop, which I've always heard (and here is a proof: Link) is the worst kind of join.

Among IN and EXIST (your actuall question), EXISTS returs better performance (take a lok at: Link)

edited Mar 13 '12 at 10:37

Arion

31,011
10
70
88

answered Mar 13 '12 at 10:23

Diego

34,802
21
91
134

Aitor · Answer 3 · 2015-04-10T08:45:43.923

If your table T2 has a lot of records, EXISTS is the better approach hands down, because when your database find a record that match your requirement, the condition will be evaluated to true and it stopped the scan from T2. However, in the IN clause, you're scanning your Table2 for every row in table1. IN is better than Exists when you have a bunch of values, or few values in the subquery. Expandad a little my answer, based on Ask Tom answer: In a Select with in, for example:

Select * from T1 where x in ( select y from T2 )

is usually processed as:

select * 
  from t1, ( select distinct y from t2 ) t2
 where t1.x = t2.y;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to the original table (typically).

In an exist query like:

select * from t1 where exists ( select null from t2 where y = x )

That is processed more like:

for x in ( select * from t1 )
   loop
      if ( exists ( select null from t2 where y = x.x )
      then 
         OUTPUT THE RECORD
      end if
   end loop

It always results in a full scan of T1 whereas the first query can make use of an index on T1(x).

When is where exists appropriate and in appropriate?

Use EXISTS when... Subquery T2 is huge and takes a long time and T1 is relatively small and executing (select null from t2 where y = x.x ) is very very fast
Use IN when... The result of the subquery is small -- then IN is typicaly more appropriate.
If both the subquery and the outer table are huge -- either might work as well as the other -- depends on the indexes and other factors.

Not very clear on last statement... is better with a bunch of values or a few values. only reason i'm downvoting for lack of clarity at this time. update your response and i'll try and up vote if more detail provided. — sheldonhull, Apr 09 '15 at 20:59

Should I avoid IN() because slower than EXISTS()

3 Answers3