1

The server at my company has been acting strange for as long as I know. Since it is a production server, we rarely do a complete shutdown/restart, but when we do, at random times we got a BSOD for some several times before it finally boots back into Windows (nothing to change, just normal resets).

I expected to get a dump file after each BSOD, but strangely enough I never got one. I have checked the startup configurations in the advance settings many times to make sure that it is configured to create a dump file, but still I haven't got any so far.

The error at the BSOD is specifically like this:

0x0000007B (0xFFFFF880009A9928, 0xFFFFFFFFC0000034, 0x0000000000000000, 0x0000000000000000)

and it is running Windows Server 2008 R2 Enterprise on a HP Proliant DL120 G6 server.

I have tried the latest updates from Windows, also tried to check hardware issues and configuration, and even geet support from HP people which they said it must be the OS error.

By some googling around, some people says that it's a filter driver error (second switch of 0x34), and I tried to remove all the filter driver instance with no luck.

Any ideas how I could fix this or at least troubleshoot it?

Update: I forgot to mention, that entering safe mode (any kind of safe mode) also triggers the BSOD, so it's not an option.

Syakur Rahman
  • 135
  • 1
  • 8

2 Answers2

1

I would look at the dump files and see if there is an obvious way to identify a driver issue.

http://blogs.technet.com/b/askcore/archive/2008/11/01/how-to-debug-kernel-mode-blue-screen-crashes-for-beginners.aspx#3476888

http://blogs.technet.com/b/juanand/archive/2011/03/20/analyzing-a-crash-dump-aka-bsod.aspx

These steps sometimes give an obvious answer quite quickly. If not, I would not spend much time looking further with this method, because that needs very specialised knowledge. Microsoft support would be able to pursue the investigation.

John Auld
  • 594
  • 2
  • 6
  • I wish I could do that. The thing is, it seems the BSOD never generates a dump file at all. Other than the usual startup configuration under the advanvced system settings, is there any other way to get the dump files? – Syakur Rahman Jun 27 '14 at 02:53
  • I missed the bit where you said that no dump is created.... If your machine is set to write a kernel memory dump, try the small memory dump option instead. – John Auld Jun 27 '14 at 03:25
  • Yes, I have tried both in the past, none worked. I wonder if there are any other conflicting configuration. – Syakur Rahman Jun 27 '14 at 03:40
  • There are some KB articles listed at the end of the following article. Perhaps one will help. If not, its a good reference. http://blogs.msdn.com/b/ntdebugging/archive/2010/04/02/how-to-use-the-dedicateddumpfile-registry-value-to-overcome-space-limitations-on-the-system-drive-when-capturing-a-system-memory-dump.aspx – John Auld Jun 27 '14 at 05:11
1

This is likely a firmware issue with the server hardware.

Many organizations and systems administrators don't take the time to update and maintain the firmware of their HP ProLiant servers. It requires a different mindset than a Dell or Supermicro system that's less tightly-integrated.

You have an HP ProLiant DL160 G6 server, so that places the deployment date to 2008-2010, when that server and processor architecture was in wide use. A quick check of the firmware revisions and release notes shows the September 2011 update:

Problems Fixed:

Resolved an issue that may result in any of the following conditions: operating system stops responding, unexpected system reset, Blue Screen when using a Microsoft Windows operating system, kernel panic when using a Linux operating system, or Purple Screen when using VMware ESX. A message may be displayed by the operating system or logged in the Event Log when this issue occurs indicating an "Uncorrectable Machine Check Exception." However, there are instances where the system resets before the operating system displays an error message and instances where the Event Log contains no log entry when this issue occurs. This issue does not occur if the Intel C-State tech is configured to "disabled" or the C State package limit setting is set to "C1" or "C3". The system is susceptible to this issue in the default Intel C-State tech and C State package limit setting configurations.

Sounds like your problem, doesn't it?

The best approach to updating all of the firmware and components in your system (ILO, NIC, RAID, BIOS, etc.) is to download the bootable HP Service Pack for ProLiant DVD image and allow it to update everything on the server.

ewwhite
  • 197,159
  • 92
  • 443
  • 809
  • Just a correction, the server I am using is DL120, not DL160. Nevertheless, I did give it a try, but somehow the issue is still there. Some further note, I did installed a firmware for the bios just weeks ago, when the HP support told us to do so. – Syakur Rahman Jul 02 '14 at 03:59