101

How does SQL engines differ when we use equal sign and IN operator have same value? Does execution time changes?

1st one using equality check operator

WHERE column_value = 'All'

2nd one using IN operator and single value

WHERE column_value IN ('All')

Does SQL engine changes IN to = if only one value is there?

Is there any difference for same in MySQL and PostgreSQL?

tfrommen
  • 177
  • 3
  • 17
Somnath Muluk
  • 55,015
  • 38
  • 216
  • 226
  • 36
    Don't try to learn millions of rules that will somehow mean that you'll always write the absolute best-performing code possible. Write clear, simple, *understandable* code that obviously produces correct results. Set performance goals. Then *measure* your code's performance. If it performs adequately, move on. Only if it isn't performing well should you spend more time on it. And at that time, trying out a trivial change like the one in your question and then *measuring* again is straightforward. Maybe one is quicker than the other, but *does it make a significant difference*? – Damien_The_Unbeliever Jun 15 '16 at 07:19
  • 3
    Agreed @Damien_The_Unbeliever . But sometimes I am getting more concerned while writing single line or word with SQL – Somnath Muluk Jun 15 '16 at 07:23
  • 1
    They are 99.9999999% the same. As long as you don't do something like `WHERE column_value IN ((select value from list_of_values where value = 'All' limit 1))` then you are fine. Sub-queries wreak tremendous havok on `in()` clause performance. – MonkeyZeus Jun 15 '16 at 19:57
  • 1
    @MonkeyZeus there is also not much truth to that based on the question at hand due to the Cost Based Optimizer (CBO) in mysql. – Drew Jun 17 '16 at 06:48
  • Similar question for MS Sql Server [sql statements with equals vs in](//stackoverflow.com/q/9476332) – Michael Freidgeim Apr 20 '18 at 03:46

6 Answers6

111

There is no difference between those two statements, and the optimiser will transform the IN to the = when IN has just one element in it.

Though when you have a question like this, just run both statements, run their execution plan and see the differences. Here - you won't find any.

After a big search online, I found a document on SQL to support this (I assume it applies to all DBMS):

If there is only one value inside the parenthesis, this commend [sic] is equivalent to,

WHERE "column_name" = 'value1

Here is the execution plan of both queries in Oracle (most DBMS will process this the same):

EXPLAIN PLAN FOR
select * from dim_employees t
where t.identity_number = '123456789'

Plan hash value: 2312174735
-----------------------------------------------------
| Id  | Operation                   | Name          |
-----------------------------------------------------
|   0 | SELECT STATEMENT            |               |
|   1 |  TABLE ACCESS BY INDEX ROWID| DIM_EMPLOYEES |
|   2 |   INDEX UNIQUE SCAN         | SYS_C0029838  |
-----------------------------------------------------

And for IN() :

EXPLAIN PLAN FOR
select * from dim_employees t
where t.identity_number in('123456789');

Plan hash value: 2312174735
-----------------------------------------------------
| Id  | Operation                   | Name          |
-----------------------------------------------------
|   0 | SELECT STATEMENT            |               |
|   1 |  TABLE ACCESS BY INDEX ROWID| DIM_EMPLOYEES |
|   2 |   INDEX UNIQUE SCAN         | SYS_C0029838  |
-----------------------------------------------------

As you can see, both are identical. This is on an indexed column. Same goes for an unindexed column (just full table scan).

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
sagi
  • 40,026
  • 6
  • 59
  • 84
  • 1
    Do you have any official reference for any database for this? I tried searching but didn't get. – Somnath Muluk Jun 15 '16 at 07:18
  • @SomnathMuluk:- I don't think there is any such official reference from MySQL on this. You can create a test case and test it yourself if you find any difference between the result. – Rahul Tripathi Jun 15 '16 at 07:23
  • 7
    @RahulTripathi I found a document on SQL in general, I suppose it applies for all DBMS. – sagi Jun 15 '16 at 07:25
  • @sagi:p- The document is correct but it is not official reference from MySQL which OP asked for. However it doesn't make your answer incorrect. And yes, it would give an added advantage to OP's query;) – Rahul Tripathi Jun 15 '16 at 07:26
  • 4
    I looked this up a while ago but wasn't able to fine any document about this, great answer ! –  Jun 15 '16 at 12:41
  • 4
    The quote is not from a discussion on performance. I think it only means "equivalent" in the sense that the two statements have the same meaning, and isn't making any claims about the optimizer. Still I agree that this will surely be optimized to be the same. –  Jun 15 '16 at 15:23
  • Using `EXPLAIN` in PostgreSQL I also found empirically that `IN` gets optimized to `=` for a single filter value – Addison Klinke Apr 07 '21 at 15:27
  • 1
    This answer is not correct. There is a difference when using '=' and IN when using a subquery, and when the primary query depends on other views. I can't post the full detail here right now, but we just had a query where changing from = to IN resulted in a speed change of 2 minutes down to 3 seconds. This is in PostgreSQL. – Freeman Helmuth Aug 25 '21 at 15:52
12

There are no big differences really, but if your column_value is indexed, IN operator may not read it as an index.

Encountered this problem once, so be careful.

sagi
  • 40,026
  • 6
  • 59
  • 84
John
  • 129
  • 5
9

There is no difference when you are using it with a single value. If you will check the table scan, index scan, or index seek for the above two queries you will find that there is no difference between the two queries.

Is there any difference for same in Mysql and PostgresSQL?

No it would not have any difference on the two engines(Infact it would be same for most of the databases including SQL Server, Oracle etc). Both engines will convert IN to =

Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
5

Teach a man to fish, etc. Here's how to see for yourself what variations on your queries will do:

mysql> EXPLAIN SELECT * FROM sentence WHERE sentence_lang_id = "AMH"\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sentence
         type: ref
possible_keys: sentence_lang_id
          key: sentence_lang_id
      key_len: 153
          ref: const
         rows: 442
        Extra: Using where

And let's try it the other way:

mysql> EXPLAIN SELECT * FROM sentence WHERE sentence_lang_id in ("AMH")\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sentence
         type: ref
possible_keys: sentence_lang_id
          key: sentence_lang_id
      key_len: 153
          ref: const
         rows: 442
        Extra: Using where

You can read here about how to interpret the results of a mysql EXPLAIN request. For now, note that we got identical output for both queries: exactly the same "execution plan" is generated. The type row tells us that the query uses a non-unique index (a foreign key, in this case), and the ref row tells us that the query is executed by comparing a constant value against this index.

alexis
  • 48,685
  • 16
  • 101
  • 161
4

For single IN Clause,there is no difference..below is demo using an EMPS table i have..

select * from emps where empid in (1)
select * from emps where empid=1

Predicate for First Query in execution plan:

[PerformanceV3].[dbo].[Emps].[empID]=CONVERT_IMPLICIT(int,[@1],0)

Predicate for second query in execution plan:

[PerformanceV3].[dbo].[Emps].[empID]=CONVERT_IMPLICIT(int,[@1],0)

If you have multiple values in IN Clause,its better to convert them to joins

TheGameiswar
  • 27,855
  • 8
  • 56
  • 94
4

Just to add a different perspective, one of the main points of rdbms systems is that they will rewrite your query for you, and pick the best execution plan for that query and all equivalent ones. This means that as long as two queries are logically identical, the should always generate the same execution plan on a given rdbms.

That being said, many queries are equivalent (same result set) but only because of constraints the database itself is unaware of, so be careful about those cases (E.g for a flag field with numbers 1-6, the db doesn't know <3 is the same as in (1,2)). But at the end of the day, if you're just thinking about legibility of and and or statements it won't make a difference for performance which way you write them.

Adam Martin
  • 1,188
  • 1
  • 11
  • 24