3

Does anybody have experience with both Linux and Windows failover redundancy clusters, and if so which do you prefer for file server and/or web server?

For a little background. We set up and adminstered a Microsoft cluster for several years under Windows 2000. The cluster was a pair of web servers with a fairly large (for the time) RAID array for multimedia storage being served to the web - 100s of thousands of mp3 commercials being served to radio stations. We had a number of things we did not like about this scenario. First, the Microsoft cluster used a shared storage array. Even though it was hot swap RAIDed, all it took was a slight corruption to the NTFS file system on the drive and all of a sudden you were down for several hours while ChkDsk ran.

So on the next build we bought into a product called NeverFail - http://www.neverfailgroup.com/ This product replicates the data between the primary and secondary server automatically keeping it synchronized at the block writing level. This has eliminated the problems we had with shared storage. But it has introduced its own issues. Any restart requires a data resync where the system analyzes everything for synchronization. While the system is up and available during this sync, on a server with less than a terrabyte of mp3 files, this takes several hours. And a typical Microsoft patch session requires a couple of these resyncs. So it often takes us upwards of 2 days to patch the 2 machines. As a result we find ourselves putting off patching and not doing it as frequently as we should which is not ideal. And the process is touchy and has to be followed specifically.

So we are considering moving the main site with all of this content to a pair of LAMP boxes with Linux HA and DRBD.

So I am curious if anybody has experience administering both Linux and Windows clusters who might tell me what they experienced. Specifically we are wondering about resync time on restarts, etc, and overall experience administering such a Linux system.

While we have trditionally been a windows shop, we now have a guy who knows Linux in house and I am learning as well now and have added a number of Linux boxes to our system, so we are open to that from an Administering point of view.

AudioDan
  • 398
  • 1
  • 14

2 Answers2

4

I love the Linux HA stuff, and DRBD is now reaching a very high level of awesome. The Windows equivalents have never provided anywhere near the same level of stability and configurability in the situations I've run into them.

womble
  • 96,255
  • 29
  • 175
  • 230
  • Womble - so you have used both in production environments? What kind of problems/issues did you have with each? – AudioDan Jun 14 '09 at 12:51
3

First a few observations, I'm sure others will have more data. DRBD has been around longer than the native windows directory sync tools, so it may be more robust. Second, Windows 2008's DFS/Replication technology has been rewritten to perform better. It hasn't been around as long as DRBD, but it promises to be able to replicate large directories between multiple servers. DFS/Replication doesn't to it at a block level like DRBD does, just the file/directory level. Full resyncs with DFS/Replication are online, rather than offline, so you won't have the same service outage you had with neverfailgroup.

sysadmin1138
  • 133,124
  • 18
  • 176
  • 300
  • Actually the resyncs are online with neverfailgroup as well. So that is not an issue. We can retain fairly constant uptime with neverfail group. The problem is more one that there are about 4 resyncs that have to take place during the course of a single critical update session in order to get both servers updated. And there are several hours for each resync. I have had at least one linux guy tell me that these large directories can be resunk much faster with DRDB - That a large directory can take minutes rather than hours to check when nothing has changed. Is this true? – AudioDan Jun 14 '09 at 12:41
  • So is there a way to make HA with service failover without a shared quorum drive in windows now, and instead using DFS/Replication and some heartbeat style service to determine when to failover? – AudioDan Jun 14 '09 at 12:49
  • Also the one other fator in this is that enterprise Windows 2008 is freakin expensive, and debian is not if we have the know how and experience to set it up and administer it. I think we are going to try some lab tests this summer. – AudioDan Jun 14 '09 at 12:50