1

The problem:

Something on our virtualized database server is using massive amounts of our pagefile. We noticed it about a week ago when the OS drive went from 30+GB of free space to around 500KB of free space in an afternoon. I located the used up space in the form of a huge pagefile (75+GB). I turned off the system-managed page file and broke it into equal parts across 4 logical disks (on 4 different physical arrays). Instead of running out of harddrive space and out of memory errors, we are now only facing out of memory errors, despite the server appearing to have several gigabytes of unused RAM.

I'm dont know how to locate the exact cause, but I have run a few tools which I had hoped would make the problem's source very clear, but nothing has been obvious enough to me.

The details:

  • Virtualized Windows Server 2008 R2 with Sql Server 2008 running on it
  • 32 GB statically assigned RAM allocated to the VM
  • Sql Server was configured to use 18GB
  • Small MySql instance running as well
    • query_cache_limit set at 8MB
    • query_cache_size set at 128MB
  • has domain controller role and is a global catalog (yeah I know it shouldnt be a domain controller, but we have limited resources)
  • Pagefile broken up into 4 parts on 4 logical disks, each logical disk is its own vhd on its own physical array on the host virtual server
    • Page file is set to 8192MB min, 12288MB max for each part
    • Original Page file was 48GB and dynamically expanded
  • Threads and processes remain around the same numbers during the problem and while the problem is not occurring - Threads: ~720-750, Processes: ~62

Things I've tried:

  • Limiting Sql Server to less RAM - 14GB - with no effect
  • stopping and disabling the MySql5.5 service, with a restart afterward
  • Increasing Sql Server to use more RAM - 28GB - with no effect
  • running RAMMap by Sysinternals - nothing out of the ordinary showed up

I can't permanently stop the MySql service and the Sql Server service needs to keep running during the daytime. There seem to be surges of memory or pagefile usage, where remoting into the server isn't even possible due to a lack of resources and then shortly afterward I can connect again. Minutes later I wont even be able to open notepad or taskmanager. Numerous errors pop up on the screen related to insufficient memory (which I don't have handy since the problem isnt occuring at this moment but I will update with various errors when they occur).

The whole time this is occurring, taskmanager says that there are several GB of free physical memory (between 12GB and 2GB, depending on memory allocated to Sql Server)

One thing that I suspect might have initially played a role in this was that one array on the host server had a failed disk and another was in predicted failure (raid5, 3 disks), so if writes were delayed I thought they might be piling up in memory or in the pagefile.

Is there anything I can initially try to determine the high page file usage with hopefully a list of processes that show how much pagefile and physical memory they are using broken down? Or any way to tell if this memory usage is symptomatic of a more serious problem with hardware or OS?

wmb
  • 11
  • 1

0 Answers0