During a migration project, I'm faced with an update of 4 millions records in our SQL Server.
The update is very simple ; a boolean field needs to be set to true/1 and the input I have is a list of all the id's for which this field must be filled.(one id per line)
I'm not exactly an expert when it comes to sql tasks of this size, so I started out trying 1 UPDATE statement containing a "WHERE xxx IN ( {list of ids, separated by comma} )
". First, I tried this with a million records. On a small dataset on a test-server, this worked like a charm, but in the production environment this gave an error. So, I shortened the length of the list of ids a couple of times, but to no avail.
The next thing I tried was to turn each id in the list into an UPDATE statement ("UPDATE yyy SET booleanfield = 1 WHERE id = '{id}'
"). Somewhere, I read that it's good to have a GO every x number of lines, so I inserted a GO every 100 lines (using the excellent 'sed' tool, ported from unix).
So, I separated the list of 4 million update statements into parts of 250.000 each, saved them as sql files and started loading and running the first one into SQL Server Management Studio (2008). Do note that I also tried SQLCMD.exe, but this, to my surprise, ran about 10-20 times slower than SQL Studio.
It took about 1,5 hour to complete and resulted in "Query completed with errors". The messages-list however, contained a nice list of "1 row(s) affected" and "0 row(s) affected", the latter for when the id was not found.
Next, I checked the amount of updated records in the table using a COUNT(*) and found that there was a difference of a couple of thousand records between the amount of update statements and the amount of updated records.
I then thought that that might be due to the non-existent records, but when I substracted the amount of "0 row(s) affected" in the output, there was a mysterious gap of 895 records.
My questions :
Is there any way to find out a description and cause of the errors in "Query completed with errors."
How could the mysterious gap of 895 records be explained ?
What's a better, or the best, way to do this update ? (as I'm starting to think what I'm doing could be very inefficient and/or error-prone)