0

I am using SQL Server and want to do Bulk Updates but don't want any row to be re-updated if same values already exists.

For Example Consider following tables with their data

DECLARE @UpdatedIds TABLE
(
    BookId  INT
)

CREATE TABLE Book
(
    [Id]        INT,
    [Name]      NVARCHAR(MAX)
)

INSERT INTO Book([Id], [Name])
SELECT 1, 'Book1'
UNION
SELECT 2, 'Book2'
UNION
SELECT 3, 'Book3'
UNION
SELECT 4, 'Book4'
UNION
SELECT 5, 'Book5'

CREATE TABLE #Book
(
    [Id]        INT,
    [Name]      NVARCHAR(MAX)
)

INSERT INTO #Book([Id], [Name])
SELECT 1, 'Book1_Changed'
UNION
SELECT 2, 'Book2_Changed'
UNION
SELECT 3, 'Book3'
UNION
SELECT 4, 'Book4'
UNION
SELECT 5, 'Book5'

What I want is when I run the following query

UPDATE [BO]
SET [Name] = [BU].[Name]
OUTPUT [INSERTED].[Id] INTO @UpdatedIds([BookId])
FROM [Book] [BO]
INNER JOIN #Book [BU] ON [BO].[Id] = [BU].[Id]

SELECT * FROM @UpdatedIds

@UpdatedIds should have records of 1 and 2 not 3, 4, or 5

I have 2 ways of doing this in my mind and would like to have opinions about which one will be fastest/better.

  1. Using Except Keyword and delete rows that already exists

  2. Using Binary_Checksum that is comparing Checksums of records in actual table and in temp table and removing records with equal checksum

Thanks

  • 2
    `EXCEPT` is guaranteed to work. Binary checksums might have collisions. – Gordon Linoff Jan 27 '21 at 15:53
  • 1
    Though this appear that it could be on topic, asking for opinions is explicitly *off topic* for [so]. You are likely better off rewording the closing part of your question, as others may see you are asking for opinions and close the question with the **Opinion-based** close reason. – Thom A Jan 27 '21 at 15:55
  • @GordonLinoff Can you also please give me some supporting material as well – Muhammad Mutahar Alam Jan 27 '21 at 15:55
  • 1
    Is your issue that you want to update many columns? If you're only updating 1 or a few columns I highly recommend you add a WHERE to your update statement so you only update rows where the values are different. Make sure to handle nulls. – Bee_Riii Jan 27 '21 at 15:55
  • 1
    It's [documented](https://learn.microsoft.com/en-us/sql/t-sql/functions/binary-checksum-transact-sql?view=sql-server-ver15#remarks), @MuhammadMutaharAlam . Note that is states that if a value is changed, so too should the `BINARY_CHECKSUM` but there are "occasional missed changes". This is because 2 (sets of) values *could* produce the same hashed value. Hashing doesn't guarantee to create a unique value for every one unique value; just that it will be consistent for the same value and that the hash can't be reversed (easily). – Thom A Jan 27 '21 at 15:58
  • @Bee_Riii Yes There are more than one columns to be updated that is why I inner joined Id – Muhammad Mutahar Alam Jan 27 '21 at 15:58
  • 1
    @MuhammadMutaharAlam . . . You can read about hash collisions anywhere that talks about hashing. Wikipedia is a good place to start: https://en.wikipedia.org/wiki/Hash_table. – Gordon Linoff Jan 27 '21 at 15:59
  • In addition to what's been said, `CHECKSUM` and `BINARY_CHECKSUM` are very simple algorithms with very poor collision behavior, compared to an optimal hash of the same size. Even if they were optimal, though, 32 bits is not a whole lot to stave off collisions; see the [table at the bottom of this page](https://preshing.com/20110504/hash-collision-probabilities/). In certain cases hashes can be used to speed things up, but they should generally not be used as a guarantee nothing has changed. – Jeroen Mostert Jan 27 '21 at 17:53

0 Answers0