2

We have noticed a problem where remote desktop sessions to a server will freeze when the server has high CPU load.

Environment:

  • VMWare ESX 4.0u1.
  • Guest OS is Windows Server 2008 R2 (this is the freezing server).
  • Guest OS has MS SQL Server 2008 R2 (10.50.4000) and in-house applications running as windows services.
  • The remote desktop client is typically on a Windows 7 laptop.

Connecting RDP to the server works fine. When the server becomes loaded this happens:

  • An existing RDP session becomes unresponsive and appears to freeze - at least the screen is not updated. If a task manager is running it becomes static in that you no longer see it update the statistics every few seconds. You can click on a button or something, and the visual response will be delayed for several minutes.
  • A console session via vmware's administration tool will appear to freeze in the same way. (So basically it seems to affect GUI/interactivity).
  • Trying to connect via RDP during this state will take an extremely long time, and will often just show a black screen that never resolves to the actual GUI.
  • Other services continue to respond! If the web application running on the server is accessed from a browser on another machine, it will respond fairly quickly and appear almost unaffected by the high load. A remote process monitor that access the "freezing" server using WMI also keeps working.

The load in this scenario typically consists of Process A doing a sequential (no threads) mix of calculations and calls to Process B. When receiving such a call Process B will typically make a database call followed by some calculations, then returning the result to Process A.

In the remote process manager we can confirm that Process A, B and the SQL Server together take up 100% CPU, but because of the sequential calls between the processes, there should really never be more than one process ready to run at any given point in time. These processes are windows services and does not interact with the GUI in any way.

It's like Windows completely starves the GUI component of CPU cycles when other processes are causing load.

I've made some experiments just to check - e.g. if I run three copies of a busy loop on my laptop, they will each take up 33% CPU, the total will be reported as 100%, but the windows GUI in general will still be fully responsive.

What causes the server GUI to freeze like this under load, and what can be done to stop it from doing so?

The vm has 6GB RAM, SQL Server is restricted to 2GB RAM, the other involved services are typically less than 200MB each. So it does not appear to be memory exhaustion.

The services are running with Normal priority, but I've also lowered them to "Below Normal", with no real change in behavior.

Update 1

In attempt to narrow the problem I've tried this:

  • On the server, at normal priority, run a custom made process which is just a busy loop. As intended this maxes the CPU at 100%. During this time the system is still perfectly reponsive for the interactive user.
  • Issue a CPU and data intensive query to the SQL Server (select * from dbo.Table where Name like '%flarp%' repeated 6 times in the same command batch). The table has 1.6 million records. No other process is taking significant CPU resources. When the query is executed, the GUI freezes completely until the query batch is completed. I set the SQL Server priority to LOW and repeat. Still freezes GUI.
  • Try both the above at the same time. I started the CPU loop (at normal priority) first and it takes 100%. When shortly afterwards I start the SQL query (in the LOW priority SQL Server), the GUI freezes completely. The remote process manager indicates that the SQL Server, despite being low priority, receives 100% CPU while my CPU loop (at normal priority) is at 0% until the query is completed. So despite the sql server having a lower priority, it completely starves the pure CPU loop.
Oskar Berggren
  • 121
  • 1
  • 5
  • Can you determine if the SQL server is under CPU pressure? As a DBA, I've seen some crazy queries load up our CPUs and bring the system to a halt. At least from a DBA's perspective, I'd try to rule each application out, one at a time. Also, check and make sure the SQL server 'Priority Boost' option is NOT checked. – Kris Gruttemeyer Sep 24 '14 at 18:36
  • `what can be done to stop it from doing so?` Throw more hardware at it so that it isn't under heavy load. – Zoredache Sep 24 '14 at 18:47
  • So, instead of trying to deduce the exact problem and, potentially fix an issue that could get worse and cause more issues, you just want to throw more hardware at it? Some companies don't have the ability to just or won't do so until all other options are exhausted. I don't subscribe to the 'add more hardware' mentality as it simply treats the symptoms, not the underlying problems. Perfect example is if SQL server priority boost was enabled, adding more CPU would alleviate the symptoms but wouldn't actually fix the problem. – Kris Gruttemeyer Sep 24 '14 at 18:57
  • @Zoredache We've tried having the SQL server on another vm on the same host. This seems to avoid the problem, but probably only because the sequential calls between the processes for this workload prevents the CPU load from reaching 100% (for more than 1 second or two) when one vm has to wait on another vm. It's still not a satisfactory solution - it's the job of a modern OS to share the CPU between processes. – Oskar Berggren Sep 24 '14 at 19:13
  • @KrisGruttemeyer "Boost SQL Server priority" is turned off. – Oskar Berggren Sep 24 '14 at 19:18

0 Answers0