1

I've got two Dell PowerEdge 2950 servers. In order to (hopefully) eliminate any downtime, I must implement a solution to detect and adjust to component failures, environment failures, etc ... the usual "downtime is the enemy" scenario.

From this point on, I'll refer to the servers as server(s), since the solution implemented may combine the two servers into one logical server, (honestly, one logical server would be preferred).

I'll have ~15 thin clients all pointing to the server(s) mentioned above. The server(s) will act as a Terminal Server. The clients will connect to the server(s) and run an instance of a client GUI. The actual server(s) itself will run the server-version of the same application, serving the client GUI's with the information/data they require ... (I hope that made sense!)

I've been recommended to use Marathon Technologies everRun 2G software. While this seems like a fair solution, it's also $12,000 ... seems a bit pricey to me, (that may be me showing my lack of experience in this field however) ...

Is there a more cost-efficient solution to such a scenario? I've been checking into a solution involving Citrix XenServer at the moment, but have yet to make much leadway with it ...

How can one implement fault tolerance to the degree mentioned above?

EDIT: The servers are running Windows Server 2003 Enterprise.

EDIT: To clarify my miscommunication, I'm shooting for failover to the still-running node in the event of a disaster. The application to be hosted provides the control of a large number of electronically-locked doors and intercoms. Therefore, if the application is unavailable, no doors will open and no communication via intercom can occur. Yikes!

EDIT: Well, after some scope change, funding and other non-technical project adjustments, the solution I'm moving forward actually used none of the approaches listed :) Long story short, we're maintaining two separate Terminal Servers; a primary and a hot backup. The switch between the two in an emergency scenario will be manual, (although it'll actually be just as fast if not faster then we originally anticipated). The server hardware (two NIC's, two battery supplies and two UPS's) will address the failover functionality required. Thanks for all your feedback, greatly appreciated!

cookbr
  • 35
  • 1
  • 6

6 Answers6

1

Marathon is a very heavyweight system, it effectively halves the capacity of the systems you have. Firstly I would ensure that you have the basics right like shared storage.

Today VMware can provide HA which is effectively having the server reboot when one of systems fails, in the future VMware will be able to track the machine such that when one of the servers died then the instance will be transparently migrated "live" to the other service.

I would point out that unless you really really need HA then generally it is better to have a simple system that works well than a complex one that should be more reliable but actually isn't.

James
  • 2,232
  • 1
  • 13
  • 19
  • Ah, I was unaware about that bit of information on Marathon! A simple system is indeed a target, as any growth on the network itself will (most likely) be in the distant future. Thanks! – cookbr Jul 24 '09 at 16:29
1

Like James mentioned, if it is really that important it might be worth looking into loading the physical servers with ESX from VMWare. Using this infrastructure you could utilize Vmotion in conjunction with VMWare's HA tools to allow the server to move seamlessly between physical servers with no downtime to the end user. This does require a SAN as well as a separate box to run the Management software, but the management software can run on something as light as a desktop.

Charles
  • 879
  • 5
  • 9
  • Thanks for your feedback. I'm believe the use of additional machines will be frowned upon, but I see the benefits of such a setup. I'll take your feedback into consideration, thank you. – cookbr Jul 24 '09 at 16:26
1

Here are some options I'd see..

Just install everything on the two servers including terminal services then use the service built into Windows server for using a "cluster IP" so everyone connects to one ip address and the two servers will decide who connects to which machine, giving a pseudo load-balancing situation.

Another is to invest in VMWare's suite of tools to use a VM for the terminal services, then use VMotion and the high-availability options to keep the VM alive.

Most situations for corporate high-availability seem to call for two servers plus a high speed SAN or iSCSI storage system on which to save the VM's or shared data between the two servers, then your server's application services run on the two systems attached to the storage server.

It might be possible to use a Xen install on Linux using DRBD and Pacemaker, but I think maybe just having the "cluster IP" on Windows to dole out connections between two terminal servers may be good enough, maybe with a NAS or other storage server to share an application data directory or home directory for data. Would that work?


I think you edited your question a little? Either that or I skimmed too fast :-)

15 users going to two servers using terminal services; I would think that for budget concerns and management, you may still be best off looking at enabling load balancing built into terminal services.

Some cautions: one user can kill a terminal for everyone. We had a user leave while logged into a terminal while viewing an animation of weather.com. After a few hours the memory use or CPU use ballooned to the point where everyone else was bogged down to a near-unusable state.

Also if there's a disconnect and a user reconnects to the second server they may be confused where their apps went they were using when the network went down, or file sharing issues on the home directory server because a file is open on server one and they're now logged into server two.

In other words, relying heavily on terminal services, regardless of your servers, means having GOOD infrastructure. That means more money on managed switches and reliable cabling and such. And you should have an IT department ready to monitor those servers for anomalies in case a user is hogging resources since one person can have an issue that cascades into other user's sessions.

Bart Silverstrim
  • 31,172
  • 9
  • 67
  • 87
  • A variant of what you recommended may indeed suit the task at hand. I'll take what you've provided into consideration, thanks! – cookbr Jul 24 '09 at 16:27
  • Maintenance is definitely something to consider. The users in this scenario are locked down into one application, nothing else. The application is simply click this, click that .. stateless. Check out my second edit in the original post ... – cookbr Jul 24 '09 at 17:01
1

I question if you really need high availability or if you simply need failover to the still-running node in the event of hardware failure. HA is going to be very expensive and, with that small of a number of users, I would think that your budget isn't all that high.

Have you considered using Microsoft's built-Terminal Services Session Directory functionality and a load-balancer? You already have "Enterprise Edition" of Windows Server 2003, so you're already "over the hump" as far as licensing expense goes in implementing the Session Directory functionality.

Some more details: http://download.microsoft.com/download/7/b/3/7b3aa957-4865-427d-9650-789179a5d666/SessionDirectory.doc

You might look at some third-party tools like 2X Loadbalancer, as well. (No personal experience with it, though...)

Evan Anderson
  • 141,881
  • 20
  • 196
  • 331
  • Thanks for your feedback, I'll look into your recommendations. As you said, my primary focus is failover to the still-running node, sorry for the miscommunication. – cookbr Jul 24 '09 at 16:17
1

I'll vouch for Citrix XenServer, but going with VMWare is never really a mistake. It may just hurt the corporate wallet if anything.

Like Charles commented, you do need a SAN (or NAS) or some kind of shared storage to truly take advantage of VMWare's High Availability/VMotion features. But to answer your questions:

Is there a more cost-efficient solution to such a scenario? I've been checking into a solution involving Citrix XenServer at the moment, but have yet to make much leadway with it ...

Citrix XenServer 5.5 and XenCenter are both free (like ESXi), but IMO have more features that bring you closer to your goal of "eliminate any downtime". But either using Xen or VMWare, you need shared storage compatible with the products requirements.

How can one implement fault tolerance to the degree mentioned above?

Well, your overall goal sounded like high availability, and now you're asking for fault-tolerance. Two different concepts within the realm of IT. I'd say given all the information presented, there may be a better alternative rather than jumping straight into the high costs of Virtualization. 15 sessions isn't exactly a tough load for your servers. Maybe virtualizing at this point in time is a bit much and maybe you can get away without it. Load balance between your two terminal servers to ease the load until more clients are needed and then look at virtualizing everything.

Another idea: you could virtualize with either VMWare ESXi or XenServer 5.5 and virtualize but without the HA/VMotion-esque capabilities now. Then when you really need to use those features, buy the upgrades and plunk down some shared storage in between the two servers. This way, you won't have to do P2V conversions before-hand.

osij2is
  • 3,885
  • 2
  • 24
  • 31
  • Sorry for my miscommunication, this is unfamiliar territory for me. As Evan Anderson hinted at, I'm looking for failover to the still running node. I'll take your recommendations into consideration, thanks! – cookbr Jul 24 '09 at 16:16
  • No problemo. Glad you got some information to setup/design your system. – osij2is Jul 24 '09 at 19:25
1

I would say that since you mentioned that a second PC to run the ESX management software would be frowned upon, your only real option is load balancing. Virtually all other solution will involve the purchase of shared storage which will probably start at close to what you paid for those two 2950's.

ITGuy24
  • 1,576
  • 1
  • 15
  • 29
  • Yes, I believe that may be the route I end up taking. I'm going to evaluate the products mentioned in this question and evaluate the ROI first, however. Thanks for the feedback. – cookbr Jul 24 '09 at 20:09