0

It is a really dangerous situation for me as a novice DBA.

Sometimes when I see that there is a deadlock in Activity Monitor screen, if query doesn't ends and stay in suspend state and also a head block, I have to kill that spid. Not always, but sometimes, though rarely, after killing spid lots of data are losing from database. I guess it loses datas from the beginning of the lock till I kill. As I understand Sql server does not save data immediately it waits for a while (some kind of cycle maybe every 15 minutes) and save data when it thinks everything is OK.

For example I took full backup every 6 hours and transaction backups every 10 minutes. Last night it couldn't take backup at 00:00 for a reason and that spid was waiting in suspend state. When I killed that spid at 08:30 in the morning I lost all data from all tables from 00:00 till 08:30..

I have full recovery model and using MsSqlServer2012.

Losing data in production database is very big risk. My question is; How can I be sure, did SQL really saved data before killing an spid?

Rapunzo
  • 966
  • 5
  • 21
  • 42
  • 2
    perhaps you should fix the actual problem, not the symptoms....Also, deadlocks are automatically detected and a victim is selected for termination. why would you even need to be doing this manually? – Mitch Wheat Aug 16 '14 at 09:05
  • because until I kill that locked spid some tables are unable to read. – Rapunzo Aug 16 '14 at 09:10
  • 2
    You must be rolling back uncommitted transactions. There is no 15 minute save cycle. Once a transaction commits the changes must persist. If you terminate the transaction before that the changes must be rolled back. – Martin Smith Aug 16 '14 at 09:55
  • You should be able to take a full backup regardless of if there are other spids in suspended state (unless those other processes are also attempting a full backup at the same time). Also, **when a transaction is committed** SQL Server ensures that all changes have been written to the disk before returning back to the client. The only exception to this is if you enable "delayed durability" on your server, which was new to SQL 2014 so is not applicable to your SQL 2012 instance. How do you know that you are losing data when you kill a spid? – BateTech Aug 16 '14 at 13:44
  • Try running DBCC CHECKDB against your database to make sure it is not corrupted. http://msdn.microsoft.com/en-us/library/ms176064.aspx this can be resource intensive so if running against a production instance then try to run at non peak time – BateTech Aug 16 '14 at 13:49
  • Deadlocks are monitored internally by the Database Engine and resolved automatically. There is no "save cycle" in MSSQL. Full backup will not help you if transaction is not committed. Full recovery model is useless unless you also make transaction log backups. Afaik it is possible to commit transaction in another spid, but I would recommend finding someone more experienced. – Roger Wolf Aug 17 '14 at 03:24
  • @MartinSmith but I am losing data from a table which I don't use transaction for insert query. – Rapunzo Aug 18 '14 at 05:25
  • @BateTech I understand lost data from printed documents. Its a small MRP system. in factory operators print stickers on fabric rolls so there are stickers but no data.. – Rapunzo Aug 18 '14 at 05:28
  • @Ennor once I was reading deep about the philosophy of sql engine as far as I remember there was a save cycle, because of that I started to think like this. BTW I am also taking transaction backups. – Rapunzo Aug 18 '14 at 05:34
  • @BateTech also there is no corruption in my database, CHECKDB found 0 allocation errors and 0 consistency errors in database. – Rapunzo Aug 18 '14 at 06:08
  • By "losing data", do you mean that you observe that the transaction that was in progress and belonged to the spid you killed is rolled back, or do you observe that other data is lost in the process? I ask because the first is fully expected. – Lasse V. Karlsen Aug 18 '14 at 06:32
  • @LasseV.Karlsen there is one table which I don't use transaction for insert. its working with direct insert query. I lost data when I killed 8 hours old locked backup spid. – Rapunzo Aug 18 '14 at 10:37

2 Answers2

2

There really isn't a way to prevent data loss in the situations you describe. SQL Server is designed to detect deadlocks and to automatically choose a victim to kill (unless of course you use DEADLOCK_PRIORITY to specify which query is less important). This means that rollback must occur and SQL Server must do housekeeping to ensure data consistency. You're interfering with that. There's simply no way around losing data.

Say you have two queries trying to use a resource and a deadlock arises. After a certain amount of time SQL Server detects this and decides to kill one thread. Because SQL Server adheres to the principles of ACID, the query isn't just automatically stopped, it begins rollback. If this query has made a lot of changes, that means SQL Server must scroll through the logs and undo all the changes before the thread is stopped. This means that a very, very long time can elapse between SQL Server finding a deadlock and the deadlock being resolved. AT NO POINT should you attempt to expedite the process by killing deadlocking SPIDs.

This is more of an organizational and operational limitation than a technical one. You and the employees using your SQL Servers must be aware that if you start a query, it MUST finish. This means whether the query completes, encounters an error and must roll back, is chosen to be killed in a deadlock scenario and must be rolled back, etc, ALL queries must finish. Knowing this, you should be moving forward with the mindset that you CANNOT kill SPIDs because they're taking a long time or because they are deadlocked. If you are getting hounded by stakeholders to kill SPIDs because of loss of productivity, educate them as to why the offending queries must run to completion and what may happen if you intervene (PRODUCTION DATA LOSS). Speak in terms of risks to the business instead of "we should" or "we shouldn't". If the stakeholders are not convinced and still want you to do something like kill SPIDs, escalate to your management and have them make the decision. If you are management, document very clearly that the stakeholders are asking you to do something dangerous and have that documentation ready. Trust me, they WILL ask why the production server is down all day and you need to be able to clearly document all the players and their roles.

Also, educate the employees using the servers about breaking up large transactions into smaller ones, or using BEGIN/COMMIT. This way, if there is an issue and a query must be rolled back, it takes minutes or hours instead of days. In the last 2 years we have had an explosion of data at my office and we now have several tables with more than a billion rows each. The learning period was very painful: we had many weeks were productivity was in the toilet because people tried to do huge updates or build very large sets of data, there was an error, and the subsequent rollback was going for DAYS. After we learned and implemented some standard operating procedures to break down queries into smaller batches, things got better. Still, I shudder to think what would have happened if the DBAs would have just started killing SPIDs.

The long and short is there's nothing you can do to prevent data loss if you continue to kill SPIDs. You must let SQL Server continue to manage the query until it is completed or killed and finished rollback. If you try to manually kill these queries, you will lose data. There's no way around that.

Further reading:

http://msdn.microsoft.com/en-us/library/aa480356.aspx

http://technet.microsoft.com/en-us/library/aa213030%28v=sql.80%29.aspx

https://www.simple-talk.com/sql/database-administration/handling-deadlocks-in-sql-server/

Patrick Tucci
  • 1,824
  • 1
  • 16
  • 22
  • Thank you very much about all explanations and sharing your experiences. I am going to find out how sql engine work on deadlock situation. that time I killed backing up spid because it was blocking the system for 8 hours but now I understand it was a mistake. – Rapunzo Aug 18 '14 at 06:01
1

After doing my database process in code inside using blocks all blocking problems disappeared.

Rapunzo
  • 966
  • 5
  • 21
  • 42