2

Can someone give me pointers where to look in order to debug why this fresh install of Windows Server 2008 R2 with an untweaked MS SQL-Server R2 which is serving just one ~5GB db to 2 clients crashes once a week?

(All updates applied, no other software running, no other roles. No Hyper-V, running on bare metal. Intel Core i5 660 @ 3,33 Ghz, 16GB RAM, 64bit Windows Server R2)

UPDATE:

I looked into the logs, filtered Windows>System logs for critical and errors only, found (translated into English): ERROR: Service Control Manager; the service "SSPORT" could not be startet, file not found CRITICAL: Kernel-Power

--these are the only severe looking things in this log, possibly unrelated. Nothing in Security logs, and in Application only MSSQL complaining about that it can't connect to the reporting server, right after restart (of which I read in the MS KB that it's normal after restart)

isync
  • 703
  • 2
  • 8
  • 20
  • 1
    Anything in the logs? – xeon Dec 21 '11 at 19:03
  • Anything I can find in the noisy Win logs is, kernel power off, restart etc. Where should I look? – isync Dec 21 '11 at 19:08
  • 2
    Have you checked for a .dmp file? – Nixphoe Dec 21 '11 at 19:25
  • Now, after your hint. "Write debugging information" is configured to write to %system%\Memory.dmp (default, never changed it) but there is no .dmp file in C:\Windows\ ! – isync Dec 21 '11 at 19:47
  • 1
    "SSPORT" it's Samsung printers related service. AFAIK such printers works well without it (as it always fails). So you can disable or even delete this service (with sc.exe, from registry or any other way you preffer) to avoid such an error in Event log. But seems this driver can not cause a BSOD, so check DMP file as specified above to find anothe possible problem source... – Sergey Dec 21 '11 at 20:21
  • SSPORT being Samsung makes sense, as someone else recently installed a new Samsung printer. And yes, it can't be the root of evil, as the crashes far precede the new printer. – isync Dec 21 '11 at 20:30

4 Answers4

3

Can someone give me pointers where to look

Check the Event Logs. Log files are the first place any administrator of any device or operating system should look. Twice. Always. Forever. No exceptions.

Server 2008 offers great filtering features for the event logs, so you can search by criticality, application source, event ID and etc. If you spend a few good hours crunching data, you should be able to recreate the history of crashes and have a great idea of what's going wrong.

If not, then blame stray alpha particles.


EDIT

Of course, I was remiss in delving deeper into the concept of event logs. I focused on the operating system. However, most enterprise-grade hardware also has event logs. If the OS is seemingly unaware of any problem on its hands, and yet the server reset itself, then perhaps you have faulty hardware that's tripping a restart response. I would suggest looking through any hardware logs that may exist for your server.

For example, in HP hardware that has an ILO card, you can sift through hardware logs for any events that might have occurred. Perhaps there was some PSU problem.

Going backwords even further, perhaps there was an issue with the PDU that your server is plugged into. Sort through those logs to see if there was some kind of power cycle that was tripped.

Trace the problem back from a top down perspective. Application -> Services -> Operating System -> Server Hardware -> Power Distribution. Each link in that chain will likely have some reporting mechanism that you can sift through to see a history of what has happened.


EDIT 2

Egads! I am a fool's fool! I left out the other most important place to look. When picking up the pieces after an OS crash, memory dumps are can lead you to the scene of the crime, the motive and the murder weapon. Once you learn how to analyze Windows crash dump files, you'll be a master detective.

Wesley
  • 32,690
  • 9
  • 82
  • 117
  • Actually very few entries right before the restart happend, as it seems, and only info entries. Win seems to attract alpha particles. Okay, any more help? – isync Dec 21 '11 at 20:07
  • 1
    @isync I updated my answer – Wesley Dec 21 '11 at 20:19
  • Thanks! But I found no .dmp files (see above comments). Anyway, everything here is indicating it's a hardware failure thing, RAM, PSU or mainboard. Sadly my hardware is not supported by WS2008R2 (FujitsuSiemens W380, no support) – isync Dec 21 '11 at 21:10
3

CRITICAL: Kernel-Power

This sounds like a hardware issue. Try replacing the power supply. It could be something else, as well (weak capacitor on the motherboard?), but the power supply will be the easiest (and cheapest) place to start.

Joel Coel
  • 12,932
  • 14
  • 62
  • 100
  • Really? So this log entry *always* indicates that power was removed by hardware means? It never means 'reboot after crash' or 'crash leading to reboot'? – isync Dec 21 '11 at 20:18
  • 1
    That's not what I'm saying. It might not mean you lost power, it might just mean a crash. But what do you think caused the crash? Most likely, it's a hardware issue. This could also mean a driver problem, but hardware is the most likely. – Joel Coel Dec 21 '11 at 20:30
2

To address this issue, Microsoft has published a knowledge base (KB) article:

Windows Kernel event ID 41 error in Windows 7 or in Windows Server 2008 R2: "The system has rebooted without cleanly shutting down first".

Click the following link to view the article in the Microsoft Knowledge Base:

http://support.microsoft.com/kb/2028504

Anyway it's looks like a 100% hardware problem. Could you please post your hardware configuration to check... Accoring to Google there are so many possible hardware related reasons of this error: from MB/CPU/video card incompatibility to BIOS settings or buggy old device drivers (HD audio and so on)

Sergey
  • 2,121
  • 15
  • 14
  • FujitsuSiemens W380 server, good shape. But I don't trust the guy who crammed in the RAM and I am not sure if the mainboard (type unknown, it's a remote system) is prone to errors. Signs point to hardware, I will track it down once I get hands on actual hardware. Thanks! – isync Dec 21 '11 at 21:14
2

I'm not sure if you've found the answer yet but I've had the same problem. Windows Server 2008 R2 keeps restarting roughly the same time once a week. I noticed it was happening a bit earlier each time but couldn't figure out why, until i stumbled upon an error message in the deepest dredges of the event log. Not in the administration area or the main ones, security etc. but under this area. Event Viewer -> Application and Services Logs -> Microsoft -> Windows -> Server Infrastructure Licensing -> Operational -> Look for ones right after the server is started back up in my case it was 9:18 AM that it shut down. Look for one of the first errors after startup, it should mention something in the general tab about it having a configuration issue and needing to be a domain controller. From there look at the error next in line, it should say something about the domain controller check not meeting certain licensing policy conditions, and that if it's not fixed the system will automatically shutdown in 6 days, 23 hours and 30 minutes. almost a week later.

I really hope this helps your situation, I was at a lost for the past 3 weeks because of this.

J. Russell
  • 21
  • 3
  • I don't have the "Server Infrastructure Licensing" subtab in "Windows" -does that mean this error doesn't apply? I am on a German system, and found this strange error: "Die Konfiguration des Szenarios "{fd5aa730-b53f-4b39-84e5-cb4303621d74}" ist fehlerhaft oder wurde explizit im WDI-Registrierungsnamespace deaktiviert. Das Szenario wird vom Diagnoserichtliniendienst ignoriert. " (~ Configuration of scenario xxx is erroneous and now deactivated in WDI reg namespace, scenario will be ignored). Your hint here sounded promising, but also, these reboots aren't so regularly every week, as it seems – isync Jan 12 '12 at 19:43
  • That's pretty strange actually, it was my understanding that this was a pretty default service for the server editions of windows. (mayhaps it's under the German word for license, but I imagine you tried that already.) ---EDIT--- I'll see if I can track any differences between the two version down, etc. – J. Russell Jan 12 '12 at 20:07