I recently discovered an error in a client API. It sends pages of unique records based on timestamps. Every so often, it will send the same exact record on end of the previous page, and start of the next page.
I was operating under the assumption that there were no duplicates (This is an event log after all). Therefore I was doing pure inserts, instead of going through the trouble of an update or some sort of 'on duplicate' step.
My question is --- what's the best way to get rid of these duplicates, and then have MS SQL quietly discard these duplicate records upon insert? They are definitely inserted AFTER the original has been written in a transaction. These duplicates happen once every 10,000 rows -- so something very memory light would be great.
The only distinct part of the records is the "ETL batch ID" - everything else is identical. Otherwise each record is supposed to have a unique "event id". If I make this event id a unique index/ key, can I somehow force MS SQL to dump any duplicate keys? Or will the application throw an error message and stop during the query?