2

I came back from the weekend and this old IBM Netfinity 5100 server is extremely slow.

It's running Windows 2003 standard, has been working great for years.

Today it took about 4 hours to boot up. Once in Windows, it takes minutes to pull up menus.

I ran some tests and I am getting a bunch of errors from the raid system. I did a test on each logical drive, all of them are fine. The tests say the raid configuration is fine.

Can any of you point me in the right direction?

Dennis Williamson
  • 62,149
  • 16
  • 116
  • 151
bladefist
  • 355
  • 1
  • 3
  • 13

3 Answers3

2

Disk problems usually leads to insane timeouts and performance problems, even if they do not report a problem (ie the error-checking manages to get by the faults but it takes so long it slows down the system while it retries operations before giving up)... but as pauska mentioned, perhaps there's even a degraded array there? (though it shouldn't be that much slower to the host OS normally).

Oskar Duveborn
  • 10,760
  • 3
  • 33
  • 48
  • Timeout was the errors I was getting in the original test. I so far do not have any tools available to me to see which disk it is, or if its the RAID card itself. Any way to nail this down? – bladefist Nov 30 '09 at 19:25
  • 1
    Sorry for the delay. It was a failed drive in the array, but it did not light up on the drive. – bladefist Mar 30 '11 at 19:59
1

What kind of tests are saying that the raid configuration is fine? Have you tried rebooting the server and go into the raid configuration utility, to see if the raid is degraded or rebuilding? What kind of RAID level is it running? Are there any LED lights on the disks that you can inspect, to see if a disk is marked as dead?

pauska
  • 19,620
  • 5
  • 57
  • 75
  • I went into the IBM diagnostic tests. The whole raid configuration failed one of the 10 or so tests. So I booted into the raid configuration, it shows everything completely fine. The LEDs on the outside are all green and normal. I am curious since the raid only failed one test, perhaps that means nothing. – bladefist Nov 30 '09 at 19:17
  • Sporiadically failing tests usually points to hardware in the process of dying. Please try to run some kind of I/O-intensive application (like a disk defrag) while taking a look at the activity leds on your disks. If they keep halting up you probably have either a disk about to die or a raid controller with a cache problem. Have you checked the lifetime on the battery on your controller (if there is any battery)? – pauska Nov 30 '09 at 19:30
  • I haven't checked the battery. Would the battery be causing something like this? From what I have read the battery is just used in the case of a power failure, as to not lose data – bladefist Nov 30 '09 at 20:05
  • Many raid controllers disable the write cache if the battery is faulty (no write cache on parity raids like 5,6=SLOW). – pauska Nov 30 '09 at 20:34
  • It's a ServRaid 4L Card. I don't believe there is a battery – bladefist Nov 30 '09 at 21:01
  • Do you have an active terminator (usuall a led light on it) at the end of the SCSI chain inside the server? If so, try to replace it if you have any by hand. – pauska Nov 30 '09 at 22:04
1

Try opening up perfmon and when it (eventually) loads up, add counters in for disk queue length.

If you have large queues building up, you know there's a problem with some element of the disk subsystem.

First guess would be the array battery, then the array card, then the cabling and disks.

See if the server will boot up OK from a Linux LiveCD or if that takes an extended period also. It'll mostly leave the disk subsystem alone and run from RAM.

Chris Thorpe
  • 9,953
  • 23
  • 33