There really isn't a way to prevent data loss in the situations you describe. SQL Server is designed to detect deadlocks and to automatically choose a victim to kill (unless of course you use DEADLOCK_PRIORITY
to specify which query is less important). This means that rollback must occur and SQL Server must do housekeeping to ensure data consistency. You're interfering with that. There's simply no way around losing data.
Say you have two queries trying to use a resource and a deadlock arises. After a certain amount of time SQL Server detects this and decides to kill one thread. Because SQL Server adheres to the principles of ACID, the query isn't just automatically stopped, it begins rollback. If this query has made a lot of changes, that means SQL Server must scroll through the logs and undo all the changes before the thread is stopped. This means that a very, very long time can elapse between SQL Server finding a deadlock and the deadlock being resolved. AT NO POINT should you attempt to expedite the process by killing deadlocking SPIDs.
This is more of an organizational and operational limitation than a technical one. You and the employees using your SQL Servers must be aware that if you start a query, it MUST finish. This means whether the query completes, encounters an error and must roll back, is chosen to be killed in a deadlock scenario and must be rolled back, etc, ALL queries must finish. Knowing this, you should be moving forward with the mindset that you CANNOT kill SPIDs because they're taking a long time or because they are deadlocked. If you are getting hounded by stakeholders to kill SPIDs because of loss of productivity, educate them as to why the offending queries must run to completion and what may happen if you intervene (PRODUCTION DATA LOSS). Speak in terms of risks to the business instead of "we should" or "we shouldn't". If the stakeholders are not convinced and still want you to do something like kill SPIDs, escalate to your management and have them make the decision. If you are management, document very clearly that the stakeholders are asking you to do something dangerous and have that documentation ready. Trust me, they WILL ask why the production server is down all day and you need to be able to clearly document all the players and their roles.
Also, educate the employees using the servers about breaking up large transactions into smaller ones, or using BEGIN
/COMMIT
. This way, if there is an issue and a query must be rolled back, it takes minutes or hours instead of days. In the last 2 years we have had an explosion of data at my office and we now have several tables with more than a billion rows each. The learning period was very painful: we had many weeks were productivity was in the toilet because people tried to do huge updates or build very large sets of data, there was an error, and the subsequent rollback was going for DAYS. After we learned and implemented some standard operating procedures to break down queries into smaller batches, things got better. Still, I shudder to think what would have happened if the DBAs would have just started killing SPIDs.
The long and short is there's nothing you can do to prevent data loss if you continue to kill SPIDs. You must let SQL Server continue to manage the query until it is completed or killed and finished rollback. If you try to manually kill these queries, you will lose data. There's no way around that.
Further reading:
http://msdn.microsoft.com/en-us/library/aa480356.aspx
http://technet.microsoft.com/en-us/library/aa213030%28v=sql.80%29.aspx
https://www.simple-talk.com/sql/database-administration/handling-deadlocks-in-sql-server/