2

we are experiencing random BSODs inside a customers VMware-hosted RemoteDesktop-server. The BSODs happen nearly weekly. Curious thing is, the virtual machine does not write any dumps we could analyze, no minidumps in %systemroot%\minidump and no full dumps whatsoever. The only hint the server gives us is an entry inside its event log, saying it had to reboot because of an critical error. The details to this log entry show an BugCheckCode 252 / 0xFC, which is ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY.

MSDN issues to analyze the dumps and look for the faulting driver stored in KiBugCheckDriver. Since there are no dumps written, we obviously dont have this option available.

The faulting server is a Windows 2008 R2 hosted on VMware ESXi 5.5.0. Installed and configured roles are RemoteDesktopServer, FileServer, Print- and Webserver. Other virtual machines running on the same host dont seem affected by this problem.

Here is the Output we get on the events details:

BugcheckCode 252 
BugcheckParameter1 0xfffff88001e64fb8 
BugcheckParameter2 0x800000000293e963 
BugcheckParameter3 0xfffff88015c55eb0 
BugcheckParameter4 0x2 
SleepInProgress false 
PowerButtonTimestamp 0 

What we have tried so far:

  • Disabled automatic reboot in system settings. We can do this only after our customers finishing time because it is one of the most productive servers they are working with. We disabled automatic reboot for all following testing scenarios:
    • We crashed the virtual machine on purpose, using NotMyFault from Sysinternals: BSODs happened and seemed pretty "normal" to us. BSOD said it was finished dumping information to disk, but as in the random BSODs our customer experiences, there were absolutely no dumps written
    • We tried to set the size of pagefile.sys manually to different sizes (up to 2 times RAM), same results

Some of the EventLog entries dont seem to have valuable information at all:

BugCheckCode 0
BugcheckParameter1 0x0 
BugcheckParameter2 0x0 
BugcheckParameter3 0x0 
BugcheckParameter4 0x0 
SleepInProgress false 
PowerButtonTimestamp 0 

Long story short, the main question would be why there are absolutely no dumps written to the disk. Analyzing the BSOD / dump itself should be the most target-aimed approach to this error.

If I can supply more information or forgot something, just ask :)

HannesS
  • 322
  • 1
  • 5
  • 17
  • Your main problem seems to be that there is no minidump file written. And the crashing system is not vmware but W2008. Without this imformations the title seems to be missleading. – marsh-wiggle Feb 04 '15 at 10:16
  • Edited, sorry for the confusion. – HannesS Feb 04 '15 at 11:23
  • 1
    Check if Windows Error Reporting Service is running. – duenni Feb 04 '15 at 13:53
  • Hi duenni, Thanks for the reply, in fact the service was set to manual and not running. I started the service and set it to automatic start, crashed the machine again, but there is still no minidump written. – HannesS Feb 05 '15 at 11:11
  • That is strange. Does the folder ```C:\Windows\Minidump``` exist? If not, create it. Any Anti-Virus solutions installed? – duenni Feb 10 '15 at 14:20
  • Folder exists, Antivirus is Avire Server Security – HannesS Feb 11 '15 at 14:42

1 Answers1

2

If you are using Intel E5 CPUs check this KB Article: http://kb.vmware.com/kb/2073791

Symptoms

When running a virtual machine with Windows 2008 R2, Red Hat Enterprise Linux or Solaris 10 64-bit, you may experience one of these symptoms: •Windows 2008 R2 blue screen events:

0x0000000a - IRQL_NOT_LESS_OR_EQUAL 0x0000001a - MEMORY_MANAGEMENT 0x000000fc - ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY

Cause

At the time of publication (September 10, 2014), these processors are identified as being affected: •Processors named as Intel® Xeon® Processor E5-#### v2, where #### is a 4-digit number, optionally followed by a letter. •Processors named as Intel® Xeon® Processor E7-#### v2, where #### is a 4-digit number. •Processors named as Intel® Xeon® Processor E3-12## v2, where ## is a 2-digit number optionally followed by a letter.

Resolution

This is a known issue affecting VMware ESXi. Contact your vendor for an updated BIOS for your hardware to resolve this issue and provide reference to the relevant Intel Errata: •CA135 - A MOV to CR3 When EPT is Enabled May Lead to an Unexpected Page Fault or an Incorrect Page Translation in the Errata section of the Intel Xeon Processor E5 v2 Product Family document. •CF124 - Incorrect Page Translation when EPT is enabled in the Errata section of the Intel Xeon Processor E7 v2 Product Family document.

Note: The preceding links were correct as of July 11, 2014. If you find a link is broken, provide feedback and a VMware employee will update the link.

If there is no BIOS update available for your platform, use one of the following to fix this issue:

Upgrades: This issue is resolved in ESXi 5.5 Update 2, available at VMware Downloads. For more information, see VMware ESXi 5.5 Update 2 Release Notes.

Currently, there is no resolution for ESXi 5.1 hosts.

If you are using these CPUs I would suggest you either upgrade your hosts BIOS or you upgrade ESXi to Version 5.5 Update 2.

duenni
  • 2,959
  • 1
  • 23
  • 38
  • Hello duenni, thanks for your reply. We installed both updates (BIOS and VMware) and hope, the error will not occur again. I'm going to mark this as answer if there are no bluescreens in the following weeks. The machine did crash very rarely, so it is hard to say if the updates solved it after such a short time. I appriciate your suggestions though! – HannesS Feb 11 '15 at 14:22
  • That's totally fine. Hope it solves your problem. – duenni Feb 11 '15 at 14:27
  • No crashes for the last month. I guess we found the culprit here. Thanks for the information and suggestions! I will mark your entry as the answer to this problem. – HannesS Mar 03 '15 at 08:44