0

Summary

I've been running a website on a VPS and I ran into my first bit of downtime (~4 minutes). However the website was only down for me and I can't see anything in the obvious log files. Where should I be looking?

There's no entries in php5-fpm.log for the time or 20 minutes either side. There's nothing in the error log.

The only entries in the nginx access log are for the "Is it down for everyone or just me service".

Where else should I look?

Detailed

Server: Ubuntu 12.04, LEMP Stack

I was getting the error: "This webpage is not available". However according to other ping checkers the website was only down for me (my house, multiple computers in my house didn't work.) Other websites worked fine.

It was only down for a couple minutes and I didn't have time to get someone else to try it. I checked with my domain provider and they said they had no downtime.

Nginx access log:

(Advagg is a drupal module which aggregates the css and js files. If it fails the site should just appear without style).

127.0.0.1 - - [06/Mar/2014:22:24:20 +0000] "GET /authcache-varnish-get-key HTTP/1.1" 302 46 "http://www.downforeveryoneorjustme.com/mysite.net" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36"
127.0.0.1 - - [06/Mar/2014:22:24:21 +0000] "GET / HTTP/1.1" 302 46 "http://www.downforeveryoneorjustme.com/mysite.net" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36"
127.0.0.1 - - [06/Mar/2014:22:24:38 +0000] "GET /sites/default/files/advagg_css/css__1394144677.css HTTP/1.1" 404 325 "-" "Drupal (+http://drupal.org/)"
127.0.0.1 - - [06/Mar/2014:22:24:39 +0000] "GET /sites/default/files/advagg_js/js__1394144677.js HTTP/1.1" 404 325 "-" "Drupal (+http://drupal.org/)"
127.0.0.1 - - [06/Mar/2014:22:25:00 +0000] "GET /sites/default/files/advagg_css/css__1394144700.css HTTP/1.1" 404 325 "-" "Drupal (+http://drupal.org/)"
127.0.0.1 - - [06/Mar/2014:22:25:01 +0000] "GET /sites/default/files/advagg_js/js__1394144700.js HTTP/1.1" 404 325 "-" "Drupal (+http://drupal.org/)"
split_account
  • 169
  • 4
  • 11
  • We need more details than "This webpage is not available." Was it DNS resolution error? Was it 404 returned from the webserver? Was it failure to connect to the HTTP server port? – Tero Kilkanen Mar 17 '14 at 23:25
  • Unfortunately I wasn't sure how to find these immediately (see my lack of experience) and the website was back up before I found anything useful. I was wondering if I could find out those sort of things from logs, but judging from your answer I'm guessing that's not possible? – split_account Mar 18 '14 at 00:05

1 Answers1

1

You probably can't find out what happened unless you have (or can get) Netflow or equivalent logs and go through them (and even then, thats an awful lot of work)

Generally the best way to handle this kind of thing is to have monitoring and be armed with knowledge and the tools to check it while its happening. A simple tool which you should install on pretty much any system as a network administrator is "mtr" (or a Windows or Android equivalent). This combines traceroute and ping and shows where network issues creep in.

Another tool (but more work / $$) is to set up (or purchase) monitoring of your systems, for example using Nagios and Cacti from a remotely located system.

While its impossible to know what caused your outage, among the most likely causes are

  1. Your DSL connection or equivalent disconnected and reconnected or

  2. There was a routing anomily and a router went down, so you lost connectivity while BGP reconverged [ie found anothe path]. This could have crept in anywhere between you and your server.

davidgo
  • 6,222
  • 3
  • 23
  • 41