2

We have a 8-port 3ware 9650se raid card for our main disk array. We had to bring the server down for a pending power outage, and when we turned the machine back on, the raid card never started.

This card has been in service for a couple years without problems, and was working up until the shutdown.

Now, when we turn the machine on, the bios option rom that normally kicks in before the bootloader doesn't show up, none of the drives start, and when the OS tries to access the device, it just times out.

The firmware on it has been upgraded in the past, so it's possible we've hit some sort of firmware bug.

We're using it in a Silicon Mechanics R272 machine with gentoo for the OS. The OS eventually boots, but alas, without the card.

We've ordered a new one, but I'm worried that if we replace the card it won't recognize the existing array. Has anybody performed a card swap before?

Any help would be greatly appreciated.

Edit: These are the kernel errors we see:

3ware 9000 Storage Controller device driver for Linux v2.26.02.012.
3w-9xxx 0000:09:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
3w-9xxx 0000:09:00.0: setting latency timer to 64
3w-9xxx: scsi0: ERROR: (0x06:0x000D): PCI Abort: clearing.
3w-9xxx: scsi0: ERROR: (0x06:0x001F): Microcontroller not ready during reset sequence.
3w-9xxx: scsi0: ERROR: (0x06:0x0036): Response queue (large) empty failed during reset sequence.
3w-9xxx 0000:09:00.0: PCI INT A disabled
HopelessN00b
  • 53,795
  • 33
  • 135
  • 209
antiduh
  • 310
  • 1
  • 3
  • 14
  • If the card has been in the machine for years, and this is one of the fist times it's been offline, it's also possible that the mechanical connection was a little loose due to thermal expansion and contraction. I've seen this multiple times, a machine goes off line and when it comes back, some card doesn't want to work. Re-seat the care - remove, and reinsert - and it magically comes back to life. It's one of the fist things I do when I see something like this now. – Ali Chehab Apr 29 '10 at 23:34
  • Thanks for the input, we did try to re-seat the card many times, tried difference pcie slots, etc, all to no avail. – antiduh Apr 30 '10 at 01:39
  • 1
    Did the card have the back-up battery installed? If so, the back-up battery would preserve the (apparently bad) state of the card even through a hard power off of the host. Edit: or moving it to a different host. But that fixed it. Whoops. =) – rakslice Dec 29 '11 at 23:18
  • 1
    IS been a long time, but I recently change my old dead 3650SE-8i(death caused: degraded hd) to a new one and all my array and my data are as they should be, so have faith. Don –  Nov 01 '12 at 04:37

10 Answers10

3

It's quite painless to swap 3ware cards.

Just make sure it's the same or newer model and that the firmware versions are the same. If the firmware versions are different, the disks won't import to the controller. (been there, done that)

Does the old card show up in lspci at all? I've had problems where the BIOS settings would get scrambled and cause the card to not show up at all. I had to reenable the PCI slot and also enable MSI for the 3Ware cards to appear again.

James
  • 7,643
  • 2
  • 24
  • 33
  • Yeah, this is what we see: "09:00.0 RAID bus controller: 3ware Inc 9650SE SATA-II RAID PCIe (rev 01)". That said, the fact that the drives dont' start and the card's bios doesn't show up during boot isn't very encouraging. – antiduh Apr 29 '10 at 14:29
  • Also, what do you mean by 'enable MSI'? That's not a bios option i'm familiar with. We're using PhoenixBIOS on the mainboard, if it's any help. – antiduh Apr 29 '10 at 14:31
  • Ahh, google-fu needed some tweaking - http://www.mjmwired.net/kernel/Documentation/MSI-HOWTO.txt – antiduh Apr 29 '10 at 14:39
  • Yeah, this sounds like a totally different problem... our systems had a BIOS option to disable/enable MSI as well as the kernel bits. Good luck with the new card. – James Apr 29 '10 at 17:09
  • Oh - have you tried re-flashing the firmware/BIOS as well? You can do it via a command line tool or the 3dm2 GUI. – James Apr 29 '10 at 17:09
2

This is Dan who posted previously, this time I've created an account :)

Anyway, now that my data was pulled.. I decided to screw around with the card and success!!

  1. Downloaded LiveCD version of Ubuntu 10.04.3 LTS

  2. Booted Live and ensured the card was detected ('tail /var/log/messages | grep 3w-')

  3. Installed tw_cli from the following guy's repo: http://jonas.genannt.name

  4. Downloaded the latest firmware (2.08.00.009) from CodeSet 9.3.0.8 for the 9500S-8 from http://www.3ware.com/support/downloadpageprod.asp?pcode=9&path=Escalade9500SSeries&prodname=3ware%209500S%20Series

  5. Used tw_cli to flash the firmware (stock tw_cli from 3ware doesn't support this). I did not use the force flag, and flashed despite already having the same version.

  6. Rebooted when it told me so.

BIOS now comes up as expected!

RMA my !@#. Perhaps I should share this with 3Ware. Big thanks to everyone for listening.

Dan
  • 21
  • 1
2

Some info on using 3ware 9650 raid cards in modern, common motherboards:

  • Avoid full size 9650 cards as they don't work with newer motherboards, bios fails to kick in after soft reset. In older motherboards they work fine (tested in core2 motherboards).

  • The low profile 9650SE cards are later made and they work fine in modern uefi, etc. motherboards.

  • They are still working (most of them made around 2007 perhaps?)

  • Did not see a failing battery yet, after 8-9 years (using them in ideal conditions, batteries always checked, charged).

  • You can switch cards, but use the same firmware (or newer if same version is not available). When building raids use the lower ports first, because you can also switch to a 9650 card with fewer ports easily as long as the higher ports are not used on the original card.

  • avoid the first x16 pci express port on the motherboard, some motherboards are expecting video cards there, causing strange behavior.

  • installing 3dm2 and cli is working out of the box in ubuntu (tested: 14.04LTS, 16.04LTS), just run the shell script from the install.

  • It's a pity that 3ware is no more, these are great products

  • if you use them still, sadly its time to swicth to something new. I'm afraid there is only LSI (now Broadband) to consider.

  • after Broadcom bought Avago they made changes to Avago website, drivers/downloads are harder to find for 3ware.

azazil
  • 21
  • 3
  • 4 years later the cards are still working, no issues at all, fully supported by Ubuntu 18.04LTS, even cli and 3dm2 software is working fine. – azazil Aug 01 '20 at 16:20
1

You should be good, i haven't done it with that particular card, but with many other Hardware raid cards. The only thing i would suggest you do is to toss the card in another machine, make sure it works, and is at the same BIOS level as your old card - downgrade if you have to.

Zypher
  • 37,405
  • 5
  • 53
  • 95
1

3ware cards are excellent at array compatibility. Do ensure the firmware is no older then the old card (as far as you can determine), and you probably want to try and keep within the same series if possible.

Keep those two in mind and it just works.

LapTop006
  • 6,496
  • 20
  • 26
  • Thanks, that's encouraging to hear. We're buying the exact same card we had before, don't want to change up anything unless we really have to. – antiduh Apr 29 '10 at 14:33
1

I happened to do some repetitive booting in a machine that had a 9500S-8 and it appears to have suffered the same fate. I came across an article for the 9650 from 3ware saying how to fix it. I couldn't believe 3Ware's solution of the only option being to RMA the card.

Anyway, I haven't been successful in applying any of the said magic to revive the BIOS. Thankfully after a couple of reboots in a different machine, it's detected properly after booting (BIOS still not coming up), detected the raid array and I'm able to mount it and pull my data.

Both Ubuntu and Fedora distros show all card info except one: BIOS string not found. I'm going to pull my data before I start screwing with firmware updates, in the meantime, antiduh, if you're still around and reading this, do you have any additional info about the Redhat version or drivers or other procedure I can try? I'm not convinced a firmware update will solve this..

DanBo
  • 11
  • 1
  • Checking through the Fedora releases, it was probably 12 or 13, likely 12, given that the machine we brought it to life in was a freshly installed machine. I don't remember anything about what drivers were installed, but they would've been whatever was available at the time. – antiduh Sep 07 '11 at 14:09
  • As for the procedure, we had been trying desperately to boot it in the original machine - leaving it on, leaving it off, booting multiple times with power on, booting multiple times with hard-power-off between boots, reseating it, switching slots. Finally we pulled it out of the Silicon Mechanics machine and stuck it in the Fedora machine. From what I remember going on, we only had to boot it once for the OS to be able to read it, twice to get the BIOS to kick back in. You'll want to play with power - leaving it out of the machine for a while, leaving it on in the machine for a while, etc. – antiduh Sep 07 '11 at 14:13
0

I have swapped an 8 port card for a 12 port card ( edit thinking about it was a 9500 not a 9650 ) and the other card has detected the array so I would have every expectation that it would work based on my experience.

James
  • 2,232
  • 1
  • 13
  • 19
0

We managed to bring the card back to life, magically. We took the card out of the machine and stuck it in a completely different machine running something redhat with very new drivers. The story goes that the first time it booted, the raid bios did not kick in during the boot (like we'd been seeing), but the kernel reported a lot of different errors. Eventually it was able to actually bring it up and then the next reboot the raid bios started working again and it booted cleanly. We put it back in the machine and everything came back to life.

To me, this sounds like a problem with microcode - i've seen some drivers for things like sound cards, soft raids, video cards, etc download some sort of microcode to the card when turning it on. If the last time that happened things went bad, or if it got corrupted due to the power blip from the UPSes kicking in when we lost power (walls down the hall turned into a waterfall), then that would certainly explain what happened.

Figured I'd post an update for all future googlers.

Edit 3-Jan-2012: @rakslice made the point that these cards often have battery back-ups attached. We hadn't tried to remove the battery (didn't think of it), but it's a great idea. Anybody else having this problem may want to try the same. We're still not sure if we fixed it because the Fedora kernel did some magic handshake to recover the card, or if we happened to leave it unpowered long enough for something to reset.

HopelessN00b
  • 53,795
  • 33
  • 135
  • 209
antiduh
  • 310
  • 1
  • 3
  • 14
0

I've got a stable of 3Ware 9650SE cards and swapping is easy. I tested that before deploying as I have 4 and 8 port cards. However, recently my experience with 3ware soured badly. It started with a hang on the backup box with 5 x 1.5TB drives. The controller was unstable, when heavily loaded (just untaring a large tgz file), and would crash within a day of burn-in testing. A spare controller worked fine. Then a 2nd controller failed and I've sent the past 4 replacements back. They all fail within 48 hours of burn-in testing on the provided firmware or the latest. A raid 5 array of 5 to 7 drives will at times crash the system so badly that the card is not detected unless the system is powered down. A raid 5 array of 4 HDs will also fail - but it takes a few days instead of hours. The QA people will not talk to me as I don't use their approved motherboards - but I've got 3 different motherboards (all Asus, 2 AMD, one Intel) which I use for testing - and a failing card fails on all of them. The failures are basically a flurry of parity errors. Typically one will see messages about the card being unresponsive and being reset and then it just does an outright hang and corruption of the data being manipulated.

Right now I can't trust the cards. Only a burn-in test for a few days reveals if a card will be stable under load. Sending them in for warranty replacement seems to be a method to just swap a flaky card for a different flaky card!

0

Ive had excellent results with the 3ware 9650se. I have owned several of them: a few 2 port cards, a pair of 4 port cards, and one 12 port that I got used for a great price. I usually plug them into the PCI-e slot that is used for a video card, and they just work.

Although, I have found a bios setting that causes them to crash. Its called the PCI Latency Timer. I use a lot of AMD mainboards, and those that have this bios option will default to 64. Unless I set it to 32, nothing is stable.

Anyway, I'm about to upgrade one array to 5 x 2TB drives and I'll have to swap controllers, so your answers have given me hope.

Is the information about the array written to the drives? Is that how a different controller can import the array? (I need to see how thats done)

  • 2
    I realize I'm bringing this back from the dead, but yes, 3Ware cards do store the information about the array setup on the first few blocks of the drives. – Kendall Sep 06 '11 at 15:24