5

Well, when we are going to select records which include nothing in a column, we can use this:

SELECT * FROM my_table WHERE NULLIF(my_column, '') IS NULL;

But, in most cases, I see developers use this:

SELECT * FROM my_table WHERE my_column IS NULL OR my_column = ''

Is there any preference between these two? (e.g. performance, default support, version support, etc.)

If yes, please explain.

Mojtaba
  • 4,852
  • 5
  • 21
  • 38
  • I'm kinda thinking you should change accepted answer to Solarflare's. Mine deals with the function vs `OR` case; their answer gives a definitive reason why `... IS NULL OR ...` is superior. – Uueerdo Aug 04 '16 at 18:36
  • @Uueerdo, Generous and Gentle ;) – Mojtaba Aug 04 '16 at 18:37

5 Answers5

4

Evaluating the two expressions on a single row should not take such a big difference. The reason people use the second version is the use of indexes. Mysql has a special optimization for this and can use it even with the or.

See IS NULL Optimization

IS NULL Optimization

MySQL can perform the same optimization on col_name IS NULL that it can use for col_name = constant_value. For example, MySQL can use indexes and ranges to search for NULL with IS NULL.

If a WHERE clause includes a col_name IS NULL condition for a column that is declared as NOT NULL, that expression is optimized away. This optimization does not occur in cases when the column might produce NULL anyway; for example, if it comes from a table on the right side of a LEFT JOIN.

MySQL can also optimize the combination col_name = expr OR col_name IS NULL, a form that is common in resolved subqueries. EXPLAIN shows ref_or_null when this optimization is used.

Community
  • 1
  • 1
Solarflare
  • 10,721
  • 2
  • 18
  • 35
2

As per my knowledge best way with respect to execution time or performance is :

SELECT * FROM my_table WHERE my_column = ''

Reference : How do I check if a column is empty or null in mysql

Community
  • 1
  • 1
Naveed Ramzan
  • 3,565
  • 3
  • 25
  • 30
  • I couldn't find any performance comparison in the post you linked – Mojtaba Aug 04 '16 at 18:09
  • 1
    Nevertheless, as I said in my Answer above, "Naveed's solution is **obvious** to anyone who looks at it." And that "wins the game, in my book, every single time." I'm not really concerned about microseconds. But, I really am concerned about *clarity.* I don't want a *human* reader to have to go scrambling for the MySQL documentation *(as I did ...)* to figure out what this statement means, and to ensure that it is actually doing what I think it does and what I want. – Mike Robinson Aug 04 '16 at 18:22
2

Calling a function on a field (in a condition) automatically discards any potential indexes on that field that could be used to speed up the query. Using OR in a list of conditions generally does the same (but for all indexes).

However, with the former, you end up making a function call for every row; with the latter there is the possibility to take advantage of short circuit evaluation.

Of course, there is a third option which can make use of indexes...

SELECT * FROM my_table WHERE my_column IS NULL
UNION
SELECT * FROM my_table WHERE my_column = ''

but then the cost of the UNION/two queries may outweigh the benefit of index use; depending on actual data.

Uueerdo
  • 15,723
  • 1
  • 16
  • 21
  • It's an interesting question whether the query-optimizer can "optimize away" `NULLIF()` and not *actually* handle it as a true function-call. (Would it instead replace it with a `CASE` as the documentation suggests? It *could...)* ## I'm not "comfortable at first glance" that your `UNION`-based alternative would really work, *and* simply "would rather not have to take a second or third glance figure out if it actually does or not." I daresay that the query optimizer *would* be smart-enough to use indexes. So, I'd rather see "simple" than "clever." *But,* "that's just me." – Mike Robinson Aug 04 '16 at 18:25
  • The UNION-based alternative is always my very last choice as it results in almost duplicating a query (which can be worse than actually duplicating it), however I can confirm from experience that MySQL will not optimize OR clauses to have the same benefit. _It's applicability also becomes less likely as the full set of conditions becomes more complicated, which may be why MySQL does not try to optimize it in a similar manner._ @MikeRobinson – Uueerdo Aug 04 '16 at 18:30
1

The two constructs are not the same. From the MySQL documentation page:

NULLIF(expr1,expr2)

Returns NULL if expr1 = expr2 is true, otherwise returns expr1. 
This is the same as:
  CASE WHEN expr1 = expr2 
  THEN NULL 
  ELSE expr1 
  END.

Therefore, this logic is testing a column for "NULL or an empty string."

  • If my_column is presently NULL, the function will return its value (NULL ...) because it is not equal to an empty-string.
  • If my_column is an empty string, the function will return NULL.
  • If any other value, that value will be returned.

I heartily agree with Naveed's answer in saying, "this is 'cute' ... don't be 'cute'." His answer ("use OR ...") results in very simple SQL that is instantly obvious at first glance. This "clever" solution is not. (And it is probably slightly-less efficient, to boot.)

And, of course, you'd really like to avoid the necessity for this sort of thing when designing your database. You never want to store "empty strings" in a VARCHAR-type field unless it truly is the case that "we have a non-NULL value, and that value really is 'an empty string.'"

Mike Robinson
  • 8,490
  • 5
  • 28
  • 41
  • Got it. About your last paragraph, I'm not always the one who designs the DB from the scratch. sometimes, I face to a table with millions of records and I'm not allowed to change the structure. BTW, thanks for your explanation – Mojtaba Aug 04 '16 at 18:22
  • 1
    I don't know if I'd call it "being cute". In this particular context it seems like overkill, but in a larger set of conditions (with potentially numerous checks like this) using this form _(or `IFNULL(x, '') = '')_ could make the query more readable. ...and the guys who have to "read" the data are not always the ones that write it. – Uueerdo Aug 04 '16 at 18:23
  • Oh, yes, Mojtaba, *that* "is the state of the real world!" **:-)** We *all* have to deal with databases containing millions of records that *"are* as they are." – Mike Robinson Aug 04 '16 at 18:38
0

Generally, the reason why people will include the ='' portion is because SQL Server stores that as an empty string, which is not the same as NULL.

Link to reference: openshac's answer

Community
  • 1
  • 1
Justin
  • 154
  • 1
  • 17
  • I know that. Both of the queries I put do the same. My question is which one is better? – Mojtaba Aug 04 '16 at 18:17
  • I would include both. It's not going to cost you much on performance, and this way you ensure it is doing what you want. – Justin Aug 04 '16 at 18:22