2

I have a two disk, RAID 1 set up from mdadm. Long story short, turns out this server has an actual RAID controller, so a hardware raid would be more preferable than the software raid. In addition, the software raid has unexpected disconnects to disk 2, which drops it to degraded mode. So we want to try out the hardware raid.

Before we go that route, though, we want to undo the RAID and just have the OS running off a single disk. My question is, how can I effectively do that?

I'm guessing I'll need to edit my /etc/fstab file, which currently has / mounted on /dev/md0. I'm also guessing that grub will somehow have to be updated? I'm unsure what else I'll need to do or in what order to do all this. My first inclination is to boot from a live CD, and dd the date from disk one to disk two, for an exact mirror (the RAID mirror has been degraded for a little while) and then start from there.

I'm running Ubuntu Server. Thanks for any suggestions.

Safado
  • 4,786
  • 7
  • 37
  • 54

3 Answers3

6

Your best, safest bet is to run a full backup and wipe and reformat the volume. Then restore data.

Bart Silverstrim
  • 31,172
  • 9
  • 67
  • 87
  • But doing a full backup and restoration would still maintain the raid configuration, right? So what I'm looking at is manually backing up all services, databases, etc.. yuck. – Safado Apr 20 '11 at 14:59
  • 1
    If you back up your /etc files that pertain to your configuration you probably would. You'd need to save your data directories instead. You're going to have other problems trying to untangle things in the long run. – Bart Silverstrim Apr 20 '11 at 16:26
5

Your options are really limited to backing up the box and restoring it on a rebuilt RAID array -- typically initializing the hardware RAID will wipe your disks.

Even if it didn't you would have to unwind the md configuration, or alternatively break the drive apart and run the software RAID degraded inside the hardware RAID array -- Frankly backing everything up and starting over is probably the least painful option...

voretaq7
  • 79,879
  • 17
  • 130
  • 214
3

for RAID-1, you're almost certainly much better with linux mdadm software raid than any hardware raid controller. using a simple JBOD HBA with mdadm. (e.g. LSI's got a very nice 8-port SAS 6Gbps - also does SATA 6Gbps, of course - for around $200.)

the only real advantage to HW raid is if it has a non-volatile (battery-backed or the newer flash-based) write cache. and it has to be non-volatile (to protect against crashes or power failures), otherwise it's no better than linux's disk caching anyway. not all hw raid cards have battery-backup or flash installed, and not all even have it as an option.

and even then you can get the same effect by using an SSD as a write cache w/ mdadm. e.g. bcache and facebook's flashcache are two implementations of the idea. they're new, so i don't wouldn't risk using them on a production system just yet (OTOH, facebook's probably done extensive real-world testing under extremely high loads of their flashcache)

(btw, if you're talking about fakeraid - the kind of raid you get in cheap cards or built-in to mainstream motherboardds, then forget about it. using that is nowhere near as good as linux' software raid)

you do seem to have a real problem that needs to be solved, though.

it sounds as though you've got problems with one of your disks (in which case, replace it ASAP), or possibly with the sata port that it is plugged in to. try plugging the disk into another port.

also check that all the cables are securely plugged in, and that your power supply is adequate for your system (most will be, but e.g. if you have a high-end graphics card drawing 200W plus motherboard and several drives on a 300W PSU, then you'll need a better PSU).

hope that helps.

to provide a better answer, you'll need to provide more details like:

  • what kind of system (esp. motherboard if its a whitebox clone rather than name-brand server)
    • what kind of disk controller
    • samples of error messages - eg. what does the kernel say when it kicks a disk out of the array.

PS: as a direct answer to your question, undoing a raid-1 is easy. just edit /etc/fstab so that it mounts the partition directly. and re-configure grub to suit. e.g if /dev/md0 was made up of sda1 and sdb1, then you can just mount /dev/sda1 (or sdb1) instead of /dev/md0. that's one of the really nice things about software raid 1 (dunno if you can do the same w/ hw raid cards - they tend to use weirdo proprietary formats). you should then be able to plug the other drive into the hw raid card, set it up as a degraded raid-1, reboot into /dev/sda1 as root, format the degraded raid, mount it, rsync your filesystem to it, make sure that you've the grub MBR installed on to it. you'll probably need to edit /etc/fstab after you copy it. then reboot to use the degraded raid as your root fs.

if (and only if) that works, you can shutdown, pull the other drive out of the non-raid slot and plug it in to the hw raid card and add it to the raid-1 array. then reboot, and you're done.

NOTE: DON'T BOTHER DOING THIS if the "raid" card is fakeraid. it's not real hardware raid, but has all the disadvantages (and more) of hw raid without ANY of the advantages. software raid is much better.

Rick Moen has a great page on linux sata & raid controllers at http://linuxmafia.com/faq/Hardware/sata.html. it'll explain why fakeraid is worse than useless.

cas
  • 6,783
  • 32
  • 35
  • Wow, thanks for the great response. As far as error messages go, here's the another post I had made with a couple log entries: http://serverfault.com/questions/241749/2nd-drive-in-raid-1-keeps-failing -- The server is an IBM x336 found here: http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-58488 -- Apparently the OS loses communication with the second disk. I don't even know if switching to a hardware raid is going to fix my issue. I tried sticking the first disk into the second slot and leaving slot 1 open [cont..] – Safado Apr 27 '11 at 16:24
  • [cont..] and it worked just fine without disconnecting the disk and crashing. I also ran the server off of the second disk by itself and it worked fine. All of this is making me believe it's perhaps not a hardware issue. So trying to use the RAID controller included with the system was my next step. If you know anything about these servers or if you have any additional instructions for me, it's much appreciated! I think this weekend I'll try and follow the steps you provided to undo the RAID. Thanks! – Safado Apr 27 '11 at 16:27
  • I've got an IBM x365 at work, and it has a ServeRAID 8k/8k-l8 raid controller, which uses the adaptec aacraid driver under linux. This *IS* a real RAID controller, not fake-raid. It did have a serious firmware problem a couple of years ago which caused us total loss of the file system (yay backups!)...so i'd recommend searching the IBM site for issues relating to your x336 and its raid controller. maybe a firmware update is in order. – cas Apr 28 '11 at 21:06
  • oops. make that an IBM x3655, not x365. with a quad-core opteron cpu, not a xeon. – cas Apr 28 '11 at 21:42
  • Just out of curiosity, I tried undoing the RAID by editing /etc/fstab and changing Grub (It's GRUB2 I just did a straight edit of /boot/grub/grub.cfg and replaced all instances of md0 with hd0) and it still mount /dev/md0 on / when I booted it. Any idea of what else I'm missing? – Safado May 02 '11 at 20:26
  • what's "hd0"? you should specify a drive and partition, like /dev/sda1 for the first partition of the first scsi drive in the system. or better yet, run blkid to get the UUID for that device and use the UUID rather than the device name (which is not guaranteed to remain the same, for a variety of reasons) – cas May 05 '11 at 09:27
  • Grub references it as hd0,0 and not /dev/sda1 doesn't it? – Safado May 06 '11 at 19:48
  • yes and no. grub itself uses (hd0,0) etc but when you're passing args to the kernel (like root=...) then you need to give the kernel the kind of args it expects, i.e. device name/label/uuid. – cas May 11 '11 at 09:52