Considering no website/service is 100% up, how do systems measure web downtime with precision?

Question

I think the title speaks for itself.

But, to give an example: In a recent post, 37 signals show it's real downtime and compares with other web services. They get very few down time and probably most companies don't have that. But, to measure all that you would need a bullet prof system with 100% uptime, or at least some kind of heuristics to simulate that. In this case they use Pingdom, but any other similar service should be capable to emulate that.

So, how do they do that? Do they leave 2 or 3 servers crawling data and do and average, not considering their own downtime? Is it trivial or complex?

Ps.: A better definition for "precision" would be measuring without mistakes, or without missing any downtime. So if the service is down you know, 100% of the time. Otherwise you could have a biased measure.

Considering that you create your own definition of uptime/availability when writing the service level agreement that uses that definition, you can also write the test method and decide what level of precision you want at the same time. This is more of a "political" issue, not a technical one. — Rob Moir, Feb 14 '12 at 14:00
Precision could be 1 min or 1 sec, but lets say you want your to measure real statistic of down time, so we choose seconds. **In this sense, which reliable ways could you measure that (considering a external service)?** — lucasarruda, Feb 14 '12 at 20:22

score 0 · Answer 1 · answered Feb 14 '12 at 13:59

0

The only way to measure system uptime with, say, one-second precision, is to use an external system to measure response - once every second, 86400 times per day.

answered Feb 14 '12 at 13:59

adaptr

16,576
23
34

Agreed. But it that system fails? You need to have multiple instances of that external system, right? – lucasarruda Feb 14 '12 at 14:40
Oh yes; I merely indicated that you can't reliably START to monitor exact uptimes without external monitoring. It's not the END of reliable monitoring ;) – adaptr Feb 14 '12 at 15:04
There is nothing magical about testing every second. If you want to give a technical report weekly on the availability of a system, and six-nines availability (99.9999% uptime) is sought, then checking every second is nothing like frequent enough. I'm afraid I agree with RobM above, this is a political not a technical issue. – MadHatter Nov 07 '12 at 20:16
To be perfectly pedantic, this is not a serverfault question. There is no concrete problem. – adaptr Nov 08 '12 at 02:22

Considering no website/service is 100% up, how do systems measure web downtime with precision?

1 Answers1