7

I'm writing a big script, full of update statements that will run daily. Some of this updates will affect rows others won't.

I believe that update statements that won't affect anything is not very good practice, but my question is:

  • How badly can this affect performance?? Is this very different from having select statements and update only when select fetches something?

Thank you, Tiago

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 1
    It depends on how complex your `WHERE` clause is. The `UPDATE` statement would have to figure out first what rows to update - or none. This might involve table scans, if you're lacking the proper indices on that table. As always: **test** it and see how bad it is. Of course it would be better to determine just those rows that really need updating - and then updating just those. But how much effort is it to determine those rows? – marc_s May 31 '12 at 09:25
  • Effort would involve messing around with table variables, and not just one or two ... i'm talking about 20 update statements running all together in a script and updating different tables. In terms of development effort would be much more than just develop the update statements. Regarding the where clauses in update, most of them have one or two inner joins at most and simple "is null" or "<> 0" where clauses. – Tiago Ferreira May 31 '12 at 09:53

3 Answers3

8

First and foremost, SELECT followed by UPDATE is almost always incorrect under concurrency. Unless things run under SERIALIZABLE isolation level, there is no guarantee that rows don't change between the SELECT and the UPDATE.

Second, for the cases when there are rows to update, the cost of SELECT + UPDATE is by definition higher than just an UPDATE.

And last, for the case when there aren't rows to update the cost of locating them by UPDATE is often the same as the cost of locating them by SELECT, so you didn't gain anything. I say 'often', not 'always', because the query optimizer may consider different strategies for update vs. select and a scan for update occurs under different locking (concurrency) rules than a scan for read.

What may make sense is to have a very cheap SELECT that can avoid an expensive UPDATE, even if not 100% accurate. The conditions between the SELECT and UPDATE can vary wildly As long as you don't get any false negatives (the SELECT say there shouldn't be be any row but UPDATE would had found row, had it been run) and the number of false positives is low (SELECT says there are rows, but the more precise/expensive UPDATE check doesn't actually find any).

Ultimately this is a problem of optimization and all optimization questions start by measuring. First locate the expensive UPDATE and then start to modify.

Andriy M
  • 76,112
  • 17
  • 94
  • 154
Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
2

If you mean to ask whether there's a difference between UPDATE … WHERE condition AND SELECT … WHERE the_same_condition + UPDATE … WHERE the_same_condition in terms of performance, then I would rather expect the latter to be less efficient than the former. Whether it is UPDATE or SELECT, the rows would still have to be figured out, and with the preliminary SELECT … WHERE … they would have to be figured out twice if there was a hit.

But @Matt Whitfield has got a point about triggers. If there's a trigger on update, it would fire regardless of whether any rows were updated. And if there's something the trigger does even if there are no rows updated (and you would like to avoid that), then either rewrite your trigger or, yes, go with the SELECT + UPDATE approach.

Community
  • 1
  • 1
Andriy M
  • 76,112
  • 17
  • 94
  • 154
  • Yes that was my question, and from the comments i got here I'm very into keeping the Update only approach. Just checking which tables have triggers on update and see the behaviour on those. Thanks! – Tiago Ferreira Jun 01 '12 at 14:25
1

It depends how you're doing the update. If you're updating using the joined syntax (i.e.)

UPDATE targetTable
FROM targetTable INNER JOIN sourceTable

Then you can pretty simply add WHERE clauses to only update the row if the values you want to set are actually different.

And yes, it can affect performance, particularly if you consider that something like this...

UPDATE targetTable SET column = column

...would fire a trigger that was defined on UPDATE. I'm not 100% on this, but I believe this also does then make it into the transaction log because it needs to be maintained in order for log tail backups and mirroring targets to be able to fully piece together the sequence of events.

Matt Whitfield
  • 6,436
  • 3
  • 29
  • 44