1

Comparing these statements

SELECT * FROM Table
WHERE
  (field IS NULL OR field = 'empty_value')

vs.

SELECT * FROM Table
WHERE
  COALESCE(field, 'empty_value') = 'empty_value'

in terms of performance, which one is better? Is there any difference?

I already know the first one requires a slightly more complex filter operation, while the second one needs +1 scalar computation prior the filter. The performance impacts of doing this in a single table are insignificant, but what may happen when the query is complex? And when there are multiple fields in this situation? Should I prefer one over the other?

PS: I am using 'empty_value' as a generic way to describe a situation where the absence of a particular value (=NULL) and a particular value (='empty_value') have the same meaning for the query. Changing anything in the table design or how it stores its values is not an option.

PS2: I am using SQL Server, but I would like to get a more generic answer about this issue. But, I would stick to SQL Server if the answer is implementation dependent.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Diego Queiroz
  • 3,198
  • 1
  • 24
  • 36
  • 2
    use explain to look up the execution plan for both queries and compare. – Twelfth Mar 16 '18 at 15:00
  • 2
    Look up execution plans, look up read statistics on the execution, and run each 20 times and benchmark runtime – dfundako Mar 16 '18 at 15:02
  • As stated in the question, I already know the differences that each change will make in the resulting query. But this could not be the answer: 'Do both, benchmark, pick one'. I am looking for a more theoretical approach to decide which one is better when I write SQL. – Diego Queiroz Mar 16 '18 at 15:06
  • What does "better" mean *for you*? Which one do you like better? How are we supposed to extrapolate to "complex"? What does complex mean *for you*? – Aaron Bertrand Mar 16 '18 at 15:10
  • 1
    There is no right or wrong answer here. It may be that with a single column being compared there is little or no difference. This could be because the column isn't indexed, the row count is small in the table. But then on a large table there is a different answer. And then if you start comparing multiple columns like in a search query it changes again. There is no answer that "pattern X is the best". The second option here is nonSARGable and I would steer away from that in general. – Sean Lange Mar 16 '18 at 15:10
  • 3
    And yet "benchmark" *is* the correct answer in the most general sense. Anyone who wants to convince you that one or the other will always be better regardless of database system, indexing structure, statistics, version, optimizer flags etcetera is selling you something. (All that said, for SQL Server specifically, `A IS NULL OR A = value` *tends* to be better than `COALESCE(A, value) = value`, as the former can be satisfied with a seek and the latter induces a scan.) – Jeroen Mostert Mar 16 '18 at 15:10
  • https://blogs.sentryone.com/aaronbertrand/backtobasics-updated-kitchen-sink-example/ – Aaron Bertrand Mar 16 '18 at 15:11
  • This is starting to lean towards a catch-all type of query. Gail Shaw has an excellent article on the topic and the performance impact of several different approaches here. https://www.sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/ – Sean Lange Mar 16 '18 at 15:11
  • @SeanLange, there's an answer, and your comment were close to what I was searching for. What there isn't is a single answer that covers all situations. Anyway, it is the first time I hear the term 'sargable' so I google it. Very interesting. Technically speaking, we're are probably better with 'sargable' queries, so I would say the first one is better in most cases. – Diego Queiroz Mar 16 '18 at 17:06

2 Answers2

4

The COALESCE() precludes the use of an index (i.e. it is not sargable). The OR offers the chance of using an index. However, this might get complicated if there are other conditions in the query.

Benchmarking the solution in one case will not give a definitive answer. If indexes are not an issue, I would not expect the performance to differ very much. However, the short-circuiting of the OR would give it an advantage even in that case.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Be careful bandying about the term "short-circuiting". T-SQL explicitly does not short circuit; logically all parts of an expression are always evaluated, even if the optimizer can fortuitously eliminate some parts sometimes. The question uses constant values, and then such elimination indeed happens, but only because we're dealing with constants. (I don't imagine I'm telling *you* anything new, but innocent readers should be protected.) – Jeroen Mostert Mar 16 '18 at 15:30
  • @JeroenMostert, I disagree with you. In the sense of programming style, if short-circuit is doable, you are OK to presume it in a discussion about performance (even if the particular language you're using does not make use of it for whatever reason). I don't think it is a dangerous assumption. – Diego Queiroz Mar 16 '18 at 16:57
  • @DiegoQueiroz: it matters very much when you think that `WHERE Column <> 0 AND 1 / Column = 3` will not give a division by zero error, because in T-SQL, it *might*. (Or, and this is the awful thing, it might *not*, depending on the execution plan and what rows are actually encountered.) You are much better off with a mental model where T-SQL does not short circuit. – Jeroen Mostert Mar 16 '18 at 16:59
  • Err, we are talking about performance right? If we are taking benefit from short-circuit, we should at least know what it does. We are all consenting adults here, it is up to you to avoid errors in your code. – Diego Queiroz Mar 16 '18 at 18:14
  • @DiegoQueiroz: my exact point was "making clear what it does". In most languages where short-circuiting is a thing, it is not even optional: in C (and all C-like languages), `if (column != 0 && 1 / column == 3)` is the most natural and idiomatic way to write this expression, and no conforming compiler is allowed to ever produce a division by zero from it. In T-SQL, there is no short-circuiting and that approach will not work. Yes, we're all consenting adults, but we're not all informed adults -- this is a site for asking questions, after all. :-) – Jeroen Mostert Mar 16 '18 at 19:44
  • @JeroenMostert . . . You might be interested in this SQL Fiddle: http://www.sqlfiddle.com/#!18/49262/4. – Gordon Linoff Mar 17 '18 at 03:14
  • 1
    Not really. I already acknowledged I know that the optimizer will eliminate parts of an expression *sometimes*, even if the language does not logically short-circuit. Your toy example obviously qualifies. At this point, though, I feel I'm better off making it a separate question, if I don't want to be stuck constructing an actual counterexample (which may involve setting up a big table). I already know they exist because I ran into them in my day job, of course, but that's been a while. – Jeroen Mostert Mar 17 '18 at 07:26
  • @JeroenMostert. . . Feel free to ask a separate question. I should point out that the order of the conditions doesn't matter, so `1/i = 0 or 1=1` also doesn't generate an error. The optimizer/execution engine is doing something. – Gordon Linoff Mar 17 '18 at 12:54
  • 1
    My point is that this thread is all about performance. If we start chatting about other things, as possible errors that a specific construction may lead, we'll need to start a new thread. From this perspective, @GordonLinoff comment were perfect: you may get benefits from short-circuit. Great enough to cite, and he did. – Diego Queiroz Mar 17 '18 at 13:38
0

Not an answer as such, but this is too much information for a comment and it is relevant to the question.

There is a third option which springs to mind in this instance:

SELECT *
FROM Table
WHERE field IS NULL 
UNION ALL
SELECT *
FROM Table
WHERE field = 'empty_value'

This is sargable but splits the query into two parts, so it's up to you to test all the alternatives you have.

MJH
  • 1,710
  • 1
  • 9
  • 19