4

We've got four servers in use for a mission-critical application that all need constant connectivity to each other--six always-up connections in total. I need a way to monitor these connections and fire off, at the very least, an email when any one of them goes down. I can find centralised solutions, but nothing that really fits this bill. Any suggestions?

EDIT: Went ahead and rolled my own in Ruby. Nagios looks like a decent bit of kit, though--would've gone with it otherwise.

Brennon Bortz
  • 205
  • 2
  • 8

3 Answers3

2

Like MarkM I was going to recommend Nagios - but I think you need to plan out what you are actually measuring more carefully. I would expect with 4 equeivalent nodes that there are 12 connections involved (ab, ac, ad, ba, bc, bd, ca, cb, cd, da, bd, dc) unless some of the connections are bi-directional (?).

It's quite possible using Nagios to either define active checks to be executed at intervals or to have the daemon waiting to receive a notification of status (in this case a failed communication from the initiating server) and even to trigger some automatic response handling (such as restarting a crashed webserver process). But you do need to think about how you deal with split-brain scenarios.

You can run the Nagios daemon on a dedicated server, or on one, or any number of the nodes in the cluster - but beware of launching automatic responses from multiple monitoring nodes simultaneously.

C.

symcbean
  • 21,009
  • 1
  • 31
  • 52
  • Nagios looks great, but again, I'm in a solely Windows environment, unfortunately. I realise that I can run Nagios on a dedicated server (which could be virtualised on on of the boxes), but this doesn't really address the need to check for connectivity between all nodes independently. Virtualising Linux install on each Windows box would be a waste of resources, as well. – Brennon Bortz Jul 12 '10 at 14:20
  • You can't sensibly do this with agentless monitoring regardless of how much you spend. You can run Nagios on each and every node - or just on one node and run nsclient++ on the others. There's no need for any virtual servers. – symcbean Jul 14 '10 at 11:37
1

Nagios is open source, free, cross platform and reliable.

MDMarra
  • 100,734
  • 32
  • 197
  • 329
0

If all you're looking for is an email when one server can't connect to another and you're ok with at the fastest one minute frequency, this could be as simple as writing a quick script (in VBScript or PowerShell) that pings the other host (or checks a specific port based on your applications) and emails you if it can't connect.

Here is some sample code from Microsoft on how to ping via VBScript and some for how to send email using a CDO object.

In PowerShell, you could use the System.Net.NetworkInformation.Ping object.

Once you've got the script, all you need to do is schedule the task for daily frequency with one minute recurrence.

Obviously this is only good if the server that can't connect can get to your mail server to email you.

David
  • 3,487
  • 26
  • 20