59

Is it possible to perform an update statement in sql, but only update if the updates are different?

for example

if in the database, col1 = "hello"

update table1 set col1 = 'hello'

should not perform any kind of update

however, if

update table1 set col1 = "bye"

this should perform an update.

nami
  • 1,186
  • 5
  • 19
  • 22

8 Answers8

69

During query compilation and execution, SQL Server does not take the time to figure out whether an UPDATE statement will actually change any values or not. It just performs the writes as expected, even if unnecessary.

In the scenario like

update table1 set col1 = 'hello'

you might think SQL won’t do anything, but it will – it will perform all of the writes necessary as if you’d actually changed the value. This occurs for both the physical table (or clustered index) as well as any non-clustered indexes defined on that column. This causes writes to the physical tables/indexes, recalculating of indexes and transaction log writes. When working with large data sets, there is huge performance benefits to only updating rows that will receive a change.

If we want to avoid the overhead of these writes when not necessary we have to devise a way to check for the need to be updated. One way to check for the need to update would be to add something like “where col <> 'hello'.

update table1 set col1 = 'hello' where col1 <> 'hello'

But this would not perform well in some cases, for example if you were updating multiple columns in a table with many rows and only a small subset of those rows would actually have their values changed. This is because of the need to then filter on all of those columns, and non-equality predicates are generally not able to use index seeks, and the overhead of table & index writes and transaction log entries as mentioned above.

But there is a much better alternative using a combination of an EXISTS clause with an EXCEPT clause. The idea is to compare the values in the destination row to the values in the matching source row to determine if an update is actually needed. Look at the modified query below and examine the additional query filter starting with EXISTS. Note how inside the EXISTS clause the SELECT statements have no FROM clause. That part is particularly important because this only adds on an additional constant scan and a filter operation in the query plan (the cost of both is trivial). So what you end up with is a very lightweight method for determining if an UPDATE is even needed in the first place, avoiding unnecessary write overhead.

update table1 set col1 = 'hello'
/* AVOID NET ZERO CHANGES */
where exists 
    (
    /* DESTINATION */
    select table1.col1
    except
    /* SOURCE */
    select col1 = 'hello'
    )

This looks overly complicated vs checking for updates in a simple WHERE clause for the simple scenerio in the original question when you are updating one value for all rows in a table with a literal value. However, this technique works very well if you are updating multiple columns in a table, and the source of your update is another query and you want to minimize writes and transaction logs entries. It also performs better than testing every field with <>.

A more complete example might be

update table1
   set col1 = 'hello',
       col2 = 'hello',
       col3 = 'hello'
/* Only update rows from CustomerId 100, 101, 102 & 103 */
where table1.CustomerId IN (100, 101, 102, 103)
/* AVOID NET ZERO CHANGES */
  and exists 
    (
    /* DESTINATION */
    select table1.col1
           table1.col2
           table1.col3
    except
    /* SOURCE */
    select z.col1,
           z.col2,
           z.col3
      from #anytemptableorsubquery z
     where z.CustomerId = table1.CustomerId
    )
Dude0001
  • 3,019
  • 2
  • 23
  • 38
  • 5
    Thanks for the explanation :) I got here because I had exactly the unnecessary write problem that you described. It appears that PostgreSQL on Amazon RDS will also just blindly write the rows, even if they are unchanged. So on a big table, issuing an `UPDATE users SET status=1;` might exhaust your IOPS (and thus leading to high latency afterwards) while an `UPDATE users SET status=1 WHERE status<>1;` can be satisfied from the in-memory cache and will thus not waste any precious IOPS. – fxtentacle Jul 21 '17 at 21:05
  • 2
    The last example also works well for stored procedures as the temp table can be replaced with parameters e.g. `select @param1, @param2, etc.` – manotheshark Mar 09 '20 at 12:25
  • 1
    this is not true . it DEPENDS. https://www.sql.kiwi/2010/08/the-impact-of-non-updating-updates.html – Anton Apr 21 '21 at 06:42
  • This is good for more sophisticated cases, but for simpler use-cases, the easier answer is the one from sll (https://stackoverflow.com/a/6677560/4801767) with a small correction based on the comment from rbennett485 below that answer. – ahron Jun 20 '21 at 04:40
  • Minor correction - in the first/easier version of the solution, there is an error - doing `select table1.col1 except select col1 = 'hello'` can lead to an error. Because the result of `select col1 = 'hello'` is a boolean, a bool value cannot be excluded from the result of a `select col from table` unless that result is also bools. So the second part should be something like `... except select col1 from table1 where col1 = 'hello'`. Then we are excluding a result of type `col1` from another result also of type `col1`. – ahron Jun 20 '21 at 04:44
  • @Anton is correct, it depends on multiple factors in the context of the `UPDATE` statement. "*SQL Server does not take the time to figure out whether an UPDATE statement will actually change any values or not. It just performs the writes as expected, even if unnecessary.*" is actually incorrect as a blanket statement. There are cases where SQL Server is smart enough to know when to avoid unnecessary updates, for certain cases. – J.D. May 19 '22 at 18:46
  • @J.D. I'm interested to know what those cases are. Can you elaborate or point us to the documentation. I can certainly update my answer but I am not aware of the scenarios or if there is difference between different versions of the engine. I do think most people that come across this question are battling net-zero UPDATE statements causing various performance issues, but it is good to point out if there are exceptions so people aren't chasing a red herring. – Dude0001 May 20 '22 at 12:38
  • 1
    @Dude0001 Absolutely, I'd recommend starting with the blog post that Ahron linked, which is by Paul White (who's very knowledgeable in the SQL Server community). They appear to be repeatable examples. I actually ran into this case myself with a simple statement of this nature `UPDATE TableA SET Field1 = Field1;` because I was testing a trigger I created on that table, and the trigger wasn't firing. I'm also personally unsure if this was a change across engine versions since when you wrote your initial answer, so that could be part of it too. Despite that, I agree with the rest of your post. – J.D. May 20 '22 at 18:31
38

The idea is to not perform any update if a new value is the same as in DB right now

WHERE col1 != @newValue

(obviously there is also should be some Id field to identify a row)

WHERE Id = @Id AND col1 != @newValue

PS: Originally you want to do update only if value is 'bye' so just add AND col1 = 'bye', but I feel that this is redundant, I just suppose

PS 2: (From a comment) Also note, this won't update the value if col1 is NULL, so if NULL is a possibility, make it WHERE Id = @Id AND (col1 != @newValue OR col1 IS NULL).

wrtsprt
  • 5,319
  • 4
  • 21
  • 30
sll
  • 61,540
  • 22
  • 104
  • 156
  • 1
    This seems like the simplest solution. – Phil Gan Jul 13 '11 at 10:39
  • 1
    Just curious. Which version of SQL-Server has a `==` operator? – ypercubeᵀᴹ Jul 13 '11 at 10:48
  • 11
    No matter how equals operator is defined in your Sql language, idea is matter – sll Jul 13 '11 at 10:52
  • @t-clausen.dk, explain please your -1 (I've corrected == to =, but this is not vital, obviously author of the question know how to write equals ;) and could ignore such kind of typos ;) ) – sll Jul 13 '11 at 10:56
  • @sll: Idea does matter, but yet I know none SQL product that has `==`. – ypercubeᵀᴹ Jul 13 '11 at 11:01
  • @t-clausen.dk, lol but this is what I initially wrote (unfortunatly I can't show you a change history, but later reviewed an question and corrected the answer, I'll fix this) – sll Jul 13 '11 at 11:06
  • Anyway you right when put -1 because I posted a wrong answer which really confused, I think -1 is a good rate so I can learn a lesson and post more attentively – sll Jul 13 '11 at 11:12
  • I can easily overlook the ==. Problem was that the question had changed after you posted your answer. So if both logic and syntax is wrong i give -1 – t-clausen.dk Jul 13 '11 at 11:15
  • I posted a solution, maybe overkill to a simple question, but it takes care of misunderstandings. – t-clausen.dk Jul 13 '11 at 11:19
  • 12
    Probably worth noting that this won't update the value if col1 is NULL - make it `WHERE Id = @Id AND (col1 != @newValue OR col1 IS NULL)` if col1 being null is a possibility – rbennett485 Jan 20 '16 at 12:03
  • For any future readers led here by Google, the comment of @rbennett485 is very important to account for. – ahron Jun 20 '21 at 04:36
  • does WHERE Id = '@Id AND (col1 != '@newValue OR col1 IS NULL) work when newValue is NULL too? it looks like the row will be updated when both col1 and newValue are NULL – shrimp rice Dec 16 '22 at 01:51
15

If you want to change the field to 'hello' only if it is 'bye', use this:

UPDATE table1
SET col1 = 'hello'
WHERE col1 = 'bye'

If you want to update only if it is different that 'hello', use:

UPDATE table1
SET col1 = 'hello'
WHERE col1 <> 'hello'

Is there a reason for this strange approach? As Daniel commented, there is no special gain - except perhaps if you have thousands of rows with col1='hello'. Is that the case?

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
10

This is possible with a before-update trigger. In this trigger you can compare the old with the new values and cancel the update if they don't differ. But this will then lead to an error on the caller's site.
I don't know, why you want to do this, but here are several possibilities:

  1. Performance: There is no performance gain here, because the update would not only need to find the correct row but additionally compare the data.
  2. Trigger: If you want the trigger only to be fired if there was a real change, you need to implement your trigger like so, that it compares all old values to the new values before doing anything.
Daniel Hilgarth
  • 171,043
  • 40
  • 335
  • 443
  • 1
    Good answer. I'm in camp 2 - I want to prevent an update trigger. – Stephen Holt Nov 06 '12 at 15:04
  • I disagree with your statement that there are no performance benefits to do this check. This will vary by scenario, but see my answer. – Dude0001 Apr 04 '16 at 14:53
  • 2
    It can be helpful when there is no value changes in update but you want to avoid of changing `ModifiedDate` value. But I think it can be easily done with stored procedure as its not a bad idea to have all DML operations done only via stored procedures. – Muflix May 29 '17 at 14:07
  • 5
    On Amazon RDS, where IOPS are limited, there can be extremely strong performance benefits. Checking if the row is modified can likely be done from the in-memory cache, thus no IOPS are used. Unnecessarily writing the rows, on the other hand, will use up IOPS. Once IOPS are exhausted, your query will take minutes to write on a big table. I have seen this cause the difference between 0.1ms and 640 seconds on a big AWS RDS instance. – fxtentacle Jul 21 '17 at 21:08
  • Be it minute but there would be the network performance gains of doing a single request vs two separate ones – Miniman Dec 16 '19 at 13:51
2

You need an unique key id in your table, (let's suppose it's value is 1) to do something like:

UPDATE table1 SET col1="hello" WHERE id=1 AND col1!="hello"
2

Old question but none of the answers correctly address null values.

Using <> or != will get you into trouble when comparing values for differences if there are is potential null in the new or old value to safely update only when changed use the is distinct from operator in Postgres. Read more about it here

maxTrialfire
  • 532
  • 3
  • 16
2
CREATE OR REPLACE PROCEDURE stackoverflow([your_value] IN TYPE) AS
BEGIN
   UPDATE   [your_table] t
     SET t.[your_collumn] = [your_value]
   WHERE t.[your_collumn] != [your_value];
  COMMIT;


EXCEPTION
 [YOUR_EXCEPTION];

END stackoverflow;
Hussain
  • 5,552
  • 4
  • 40
  • 50
Tvitmsvleli
  • 247
  • 2
  • 8
-1

I think this should do the trick for ya...

create trigger [trigger_name] on [table_name]
for insert 
AS declare  @new_val datatype,@id int;
select @new_val = i.column_name from inserted i;
select @id = i.Id from inserted i;
update table_name set column_name = @new_val
where table_name.Id = @id and column_name != @new_val;