0

I am having a quite complex problem and while I found solutions to individual steps (and already applied some of them in different contexts), I am not quite sure on how to do the whole procedure properly. The system is a 24/7 development ubuntu 12.04 server and data loss is absolutely inacceptable, downtime is ok. So, right now the server is running a raid-6 with 5 2.5TB disks, totalling 7.5TB of storage. One disk is beginning to fail and since space is beginning to get scarce, we decided to increase the disk space while replacing it. Summing up...

NOW: 5 disks 2.5TB, software RAID-6 7.5TB, on top of that LVM, /boot is on a separate drive, all other file systems are on this RAID

AFTER: 4 disks 4TB, software RAID-6 8TB (with option to add more disks in the future), on top of that the same file hierarchy

I know how to increase the disk space by replacing each of the 5 disks one by one (will take ages but acceptable). After the last disk is fully synced the raid volume should be automagically bigger (12TB) and LVM should be able to take advantage of the new space. Please correct me if I am wrong here. However, since we want to put in only 4 drives I am unsure how to do it. The raid volume size is still bigger than what is currently used by LVM but I am not sure about the migration procedure. Unfortunately, there are only ~600GB of free space so I cannot downsize the existing RAID-6 first. Although I could imagine freeing space by copying data to an external drive.

Andreas
  • 11
  • 2
  • 7
    "data loss is absolutely inacceptable, downtime is ok." <- sounds like 1) you better make sure you have a well-tested backup regimen in place and 2) get new storage hardware and migrate your data from one live array to the other. Anything short of that is going to mean either excessive downtime or risk of some fluke destroying the data mid-migration. – EEAA Feb 03 '15 at 14:02
  • Honestly, with drives of that size, **any** type of parity RAID scares the snot out of me due to the huge rebuild times. As long as you're migrating things around, maybe consider RAID 10? – EEAA Feb 03 '15 at 14:06
  • 1
    Should your new 4TB Raid be growable in future times or while did you consider Raid6 with 4 disks over Raid10? For the problem itself, I can agree to EEAA, its much less pain to create an new array, migrate and the move the array -> Minimum downtime + no additional data loss chance – Henrik Feb 03 '15 at 14:06
  • 2
    "Although I could imagine freeing space by copying data to an external drive." Errr... where's your backup then? In a well organized environment with proper backup & networking, you'd simply setup a fresh new RAID6 array, and restore the server from backups. Rebuilding each hard disk is just about the least efficient way of doing this. – DutchUncle Feb 03 '15 at 14:22
  • @Andreas - just out of interest have you got 4TB disks with 100% duty cycle? worth checking – Chopper3 Feb 03 '15 at 14:26
  • @EEAA I agree that rebuild times will be long. I prefer not to elaborate on RAID10, both strategies have there pros and cons. Also, I know the risks of UREs but this is a different discussion. – Andreas Feb 03 '15 at 14:30
  • @Henrik Growing in the future is exactly the reason for RAID6, also once synced, it provides higher protection against multiple failures, although sync times should considered as well here. Again, this is no discussion RAID10vs6. Given the highly custom server setup, just replacing drives and waiting for them to sync is the least pain. – Andreas Feb 03 '15 at 14:32
  • @DutchUncle Not all data needs to be saved redundantly and could be moved out of the way before doing the RAID operations. That might even amount to ~2TB – Andreas Feb 03 '15 at 14:35
  • Wow, you have a lot of problems. "Given the highly custom server setup" So... how would you ever rebuild this server from bare metal? Is the setup documented at least? Do you have any backups of the system? Hammering this RAID6 array with multiple rebuilds is a very high risk strategy. You may find out the hard way that it's more than 1 or 2 hard disks that are about to die on you. – DutchUncle Feb 03 '15 at 14:49
  • @Chopper3 I am not sure I understand your question. Could you please be more precise, what do you intend? – Andreas Feb 03 '15 at 14:50
  • Bear in mind that a 2nd and/or 3rd will fail BEFORE the rebuild is complete... – DutchUncle Feb 03 '15 at 14:50
  • Yes, a lot of problems: what chopper3 means is that there's different types of hard disk, with different duty cycles. Are yours rated to be used 24/7 (i.e. continuously)? Or are they desktop hard disks? – DutchUncle Feb 03 '15 at 14:52
  • @DutchUncle Please, I know the math. Let's not discuss the risk of data loss due to multiple UREs, we're not talking RAID5 here. Let's also not discuss why the server setup is as it is. If it's any easier, let's assume the 5 disks are 2.5MB and I want to migrate to 4 4MB drives. – Andreas Feb 03 '15 at 14:53
  • @Chopper3 all drives are rated 24/7 – Andreas Feb 03 '15 at 15:03

1 Answers1

1

With (open)ZFS or btrFS you can actually do this wacky migration, but it still would be inefficient to do so.

Even if the built-in software RAID of Ubuntu-12.04 can do it, I still would advise against it. EEAA should turn his comment into an answer, because I think that's the correct one: migrate the data from the old array (or better yet, your backups) to the new fresh array.

Keep the old RAID disks around for a while, as a 'snapshot', but it should be easier to recover the data from your backup & recovery system...

DutchUncle
  • 1,265
  • 8
  • 16
  • Unfortunately, the server hardware does not support that, otherwise I would replaced the failing drive and increased space by adding another one. – Andreas Feb 03 '15 at 15:04
  • 1
    +1 from me. I completely agree with your endorsement of EEAA's comment; if downtime's no object then down the array, back it all up to eg tape, verify the backups, nuke and recreate the array, and restore. When I think of failing out discs for bigger ones one at a time, a lovely line from [the old BBC series "*Yes, Minister*"](http://en.wikipedia.org/wiki/The_Writing_on_the_Wall_%28Yes_Minister%29) comes irresistibly to mind: "*If you're going to do this damn silly thing, don't do it in this damn silly way*". – MadHatter Feb 03 '15 at 15:04
  • +10 to you :-) Thanks man, very much appreciated! – DutchUncle Feb 03 '15 at 15:11
  • 1
    @MadHatter if "data loss is absolutely inacceptable" then he should already have backups :P – JamesRyan Feb 03 '15 at 15:34
  • @JamesRyan I completely agree. – MadHatter Feb 03 '15 at 15:50
  • I will correct the "unacceptable" in the question later. Let's say I am willing to accept the risk for data failure during sync of a raid6, which has full double parity. I do not think this is a silly thing as it would be with raid5 and I prefer not to be insulted. If you think differently, please do the math. Creating a new array and moving everything is one option but due to hardware limitations, it is not possible I'm afraid. Also, we do have backups of the most important data, my backup strategy is off topic as well. – Andreas Feb 03 '15 at 15:53
  • 1
    @Andreas just because you don't like the answer doesn't make it off topic. To do the disk by disk rebuild you are reading the entire array 6 times and writing it once, plus you will need a 5th 4TB drive. If you have a proper backup you can take all the existing disks out and keep them safely in case something goes wrong, put the new 4 drives in and restore once. Finished in a fraction of the time and with far less risk. – JamesRyan Feb 03 '15 at 16:29
  • @JamesRyan I agree but unfortunately I can't do it straightforward like this which is why I asked this question. I guess it is better to update the question and make it more precise. Until then I am willing to agree that it is a silly thing to do but an interesting challenge :-) – Andreas Feb 03 '15 at 17:23
  • 1
    *Intresting challenge* and *production data* are two concepts that don't sit well alongside each other, at least not to the audience around here. Your insistence - with no public justification - on doing this in a needlessly risky way even though you've been told several times, by several people, the safest and most professional way to do it has persuaded me to vote to close this question on the grounds of reasonable management practices, though of course others may not agree. – MadHatter Feb 03 '15 at 18:23
  • In this case I might as well delete the question although I do not agree with your reasoning. The question was about whether it is possible to do such a migration at all in the way I need to do it, due to limitations in hardware and admin time that we have. Not whether it makes sense or is best practice. The safest and most professional way as you put it and as it was proposed by many here is known to me but unfortunately not feasible. I would have never asked the question if it was. Anyway I found and tested a way to make it happen. If somebody still reads this discussion they may contact me. – Andreas Feb 03 '15 at 22:05
  • 1
    We understand you feel you have constraints, but you won't explain *why* they are constraints. If they're really unavoidable, sometimes the correct response is "*the job can't be done as described*"; much of professional sysadmin is speaking truth unto power. In any case, I agree, the question should either be deleted, or you should document your procedure in an answer and, when the system permits, accept it - either of these will stop the question floating around unanswered forever, though the latter means it might be of some use to future readers so is better. – MadHatter Feb 04 '15 at 08:42