0

I have a web system. I need to calculate the up time of the whole system. I have load balance (in the moment without virtual ip and redundant load balance), one database. Two servers as cluster. Host provider.

Can someone tell me in rude steps how can I take all this and how can I estimate the up time of the system ?

How is the up time of a complex system calculated ?

I know this is difficult to say, but please explain some general methods.

jscott
  • 24,484
  • 8
  • 79
  • 100
darko petreski
  • 289
  • 1
  • 2
  • 8

2 Answers2

1

In general, you've got that whole setup because you don't particularly care if one aspect of it goes down, as long as the customer-facing part is still up. Some uptime checkers only check for a 200 HTTP response from your website (even if that response is covered with SQL error), others are a little more specific.

In general, this is your business plan/SLA and you need to write it. What do you need? Does it matter if users can't login as long as everything else works? Do you only need your index page to be up? Or do you need the whole thing to load before you consider it uptime?

Stop trying to calculate the uptime for everything, and only measure what's important - the end result.

James L
  • 6,025
  • 1
  • 22
  • 26
  • Thanks, this helps. Do you know some uptime checker programs available in ubuntu distribution or others (I like to use my own checker) ? – darko petreski Nov 09 '10 at 13:38
  • I agree with James; what's important is whether your customers can do what they want to do. Work out what that is, and measure it - I've had a lot of luck with getting developers to put a single page inside the app that tests the pathways we want tested, and put some simple YES/NO answers into an HTML page, which is then easy to test with eg nagios. – MadHatter Nov 09 '10 at 14:10
0

Work from bottom (electricity, cooling...) to top (software layer). Even the best software and the best clustering solution won't help you if you have everything in one data center and that suddenly goes down.

Your question is very complicated and while calculating you have to take into account at least these factors:

  • How is your data stored? In one data center? In multiple data centers?

  • Are the data centers reliable? How about the network connection(s) between them?

  • Are your routers, load-balancers, servers and other equipment reliable or do you need to replace parts often?

  • While doing regular maintenance, do you need to take your entire site down, or are you able to update software etc without taking your site offline?

  • How are you prepared for external attacks, such as DDoS?

  • What if something goes wrong with your database, file server or other critical components? Yeah yeah, you mentioned they are clustered. That doesn't mean they cannot go down.

  • How fast can you recover from backups?

  • What do you consider as "site is up"? Front page up? Existing logged in users working OK but not able to add/remove accounts? Site has to response in no more than X seconds? Everything has to be 100% top notch?

Or, if by calculation you meant monitoring so you can see overall trends and current status, then take a look at Nagios.

Janne Pikkarainen
  • 31,852
  • 4
  • 58
  • 81