0

I'm getting BSOD on a w2k3 machine either during startup (before the mouse cursor or CTRL-ALT-DEL login has appeared), or if I'm lucky enough the machine boots, but sooner or later the same BSOD appears.

The message is :

IRQL_NOT_LESS_OR_EQUAL

I haven't been able to get the Stop Code as the screen disappears too quickly and it reboots (it never dumps the core). Pause isn't holding the blue screen.

Event Viewer is showing no unusual activity immediately prior to the crash. The services being started and activities taking place etc. are wildly different between crashes.

The Bootlog indicated that one crash occurred after bluetooth drivers were loaded. I didn't need those, so removed them, but the problem persists.

I suspected memory. There are 2 x 512MB simms in 2 slots. I removed each one from either slot, and booted with only half the memory. I also swapped the slots for both sims, and also tried one at a time. In all cases the BSOD continued to occur (mostly at boot). I feel this rules out bad memory since I find it highly unlikely 2 memory modules and/or 2 slots would go bad at the same time.

I did however run memtest, and it reported bad memory -- could it be the memory controller module on the mainboard?

No new drivers or applications have been added to the system prior to this problem starting. The machine has been running for 5 years without much incident.

I have done a complete system cleanup, scandisk (full), reg-check, checked CMOS settings, and removed a lot of old apps and junk in the hope of tuning it all up. I've also removed CD-ROM drive (not used much), reinserted the hard-drive in it's IDE slot, unplugged and plugged everything back in several times and physically cleaned it's innards. Checked fans are all working.

Problem is persisting!

rwired
  • 381
  • 2
  • 7
  • 18

6 Answers6

1

Try this Microsoft Article on troubleshooting stop errors.

Jack B Nimble
  • 1,505
  • 1
  • 10
  • 13
  • This is a superb doc! Thanks for brining it to my attention. I'm now able to get the stop parameters 0x0000000a (0xF78A7258, 0x00000002, 0x0000001, 0x80505496). There isn't a driver address messages. But the stop parameters are the same on every crash (whether it was at bootup, or post-boot) – rwired Jul 14 '09 at 07:20
1

I can't know for sure, but IRQL_NOT_LESS_OR_EQUAL has always been a hardware conflict or driver problem for me. Boot it in Safe Mode and see if you can get it to crash. If it BSODs in Safe Mode, it's probably a hardware problem. If it doesn't BSOD in Safe Mode, it's probably a driver issue.

Also, how did you uninstall the Bluetooth driver? It's possible that an application or driver uninstaller actually left the driver running. If possible, check Device Manager (View-Show Hidden Devices) to see if you can determine which .SYS file(s) were included in the Bluetooth driver. You may also be able to extract (but not install) the original driver to see which .SYS files it includes. Once you know the name of the driver file(s), try to see if it still exists on the server. You may be able to disable the driver from Device Manager, but I have had to go as far as renaming the .SYS driver file in Safe Mode to prevent a driver from loading.

Carl C
  • 1,038
  • 3
  • 10
  • 19
  • You were right about the bluetooth driver. The offending file BTHID.SYS was still loading according to ntbtlog.txt, despite having uninstalled the 3rd party software it came with and removing from Device Mgr. I renamed the file and now it doesn't load. But the BSOD is still occurring (at same address). The bootlog shows that the crash occurs after any number of different drivers has loaded. I can't find a common thread. In the Event Log it seems to happen shortly (but not immediately) after the IPSec service had started. I've disabled that service, but it still occurs. – rwired Jul 15 '09 at 06:46
  • Generally it does not BSOD in Safe Mode, unless I enter Safe Mode with Network Support, or in one instance it BSOD when I elected to not load SPTD.SYS which was a question asked by safe-mode startup. Therefore I would agree with you that it looks like a driver/sw issue. However, EVERY other test I've ran points to hardware (e.g. not being able to pin-point the driver or service responsible, and the crash occurring randomly especially if booting from a cold CPU). – rwired Jul 15 '09 at 06:52
  • This is a tough one. Since you indicated that "Generally it does not BSOD in Safe Mode" I think I'd go back to your initial memtest results. It's possible you do have a memory-related motherboard problem. If this is a machine with a warranty, it might be best to ask the vendor to replace both the motherboard and the memory as a precaution. – Carl C Jul 15 '09 at 16:38
  • Just to clarify, I took "Generally it does not BSOD in Safe Mode" to mean it does BSOD in Safe Mode, even without network support enabled. – Carl C Jul 15 '09 at 16:44
  • It would not BSOD in Safe Mode *unless* I interrupted certain drivers from loading -OR- if I enabled network support. Since it was a spurious crash (not always occurring) I continued to suspect hardware. But in fact it turns out it was the network drivers all along (see my solution post) – rwired Jul 17 '09 at 05:24
0

The next thing I would check is the CPU. An overheating CPU can cause all kinds of strange errors. If you have another box with the same CPU, swap them out.

Also, you might want to replace the CPU fan and the heat sink compound. I know you said the fans were all spinning, but the heat sink might not be making good contact with the CPU.

Anthony Lewis
  • 909
  • 7
  • 8
  • I left the machine off overnight, in room with air-con. Today it refused to reboot with the same error. I removed the heatsink from the CPU, cleaned it all again. Remounted it and checked there was a proper (thin) layer of thermal paste between the pads. It refused to boot again until the third try. It then crashed with exact same stop parameters after a few mins. After 4 more attempts it booted and is running now. This seems to rule out an overheating CPU. Possibly the opposite... could perhaps a cold solder spot on the mainboard be allowing the hardware to perform AFTER heating up a bit? – rwired Jul 14 '09 at 07:17
0

On topic with the CPU, i've seen a dust layer build up between the top of the fins on the heatsink and the fan, taking the fan off and removing the dust has been the fix of many servers that are spontaneously rebooting and crashing.

DanBig
  • 11,423
  • 1
  • 29
  • 53
  • Today again I had difficulty booting. I booted in safe mode, then rebooted in normal mode. I've left the machine running for several hours and it hasn't crashed. This to me seems to support the theory that this machine is more stable when its hotter rather than it being an overheating issue. Is this reasonable? – rwired Jul 15 '09 at 06:54
  • If you are able to keep it powered on for several hours, that does seem to rule out heat. – DanBig Jul 15 '09 at 12:09
0

did you read your dump file? find your dump file and then process it through microsoft debug tool and it will give you the possible driver or item that is causing bsod. if you are not sure how to post back.gd

you can see the irq value under device manager

dasko
  • 1,244
  • 1
  • 22
  • 30
  • actually check your interrupt request values you might have items on the same code and could be having issues with them failing. i had the same value for irq for sata drivers and a nic and i was failing on it.gd –  Jul 13 '09 at 19:51
  • there are no IRQ conflicts. I can't get it to save any kind of dump file. Despite changing settings in the System control panel. It simply crashes and restarts every time. Even when I change to don't restart, and write a complete dump file. – rwired Jul 15 '09 at 01:55
0

Problem solved.

The fact that it would BSOD in Safe Mode only when Network Support was enabled let me to suspect the network card. Sure enough without the network card plugged in it ran fine. Furthermore after I upgraded the Network Card drivers the the latest version the problem has gone away. So it was software related all along. I suspect the old drivers had been corrupted in a minor way or the network environment had changed in someway causing a bug in the old drivers to manifest.

rwired
  • 381
  • 2
  • 7
  • 18