The first step would be to isolate where the failure is happening. It sounds like you were able to connect to the server during the outage, so it seems unlikely to me that there was a general server failure or a server-local network problem.
The first thing I would do if my web browser was unable to bring up the page would be to establish if port 80 is responding to connection attempts. The easiest way to do that is to use telnet
, eg (assuming you're using something Unix-like):
$ telnet your.server.name 80
Try it out with servers you know are working to see what a successful message looks like. For www.google.com, eg, I get:
$ telnet www.google.com 80
Trying 74.125.95.103...
Connected to www.l.google.com.
Escape character is '^]'.
(To exit from telnet in this state, you need to hit Ctrl-], then Enter, then Ctrl-D.)
Failures you might see include DNS failure:
$ telnet fake.dns.entry 80
telnet: could not resolve fake.dns.entry/80: Name or service not known
In which case you would follow up by trying to connect to the IP address.
Another failure possibility is a refused or timed-out connection:
$ telnet serverfault.com 99
Trying 64.34.119.12...
telnet: Unable to connect to remote host: Connection timed out
This generally means either the server or a load balancer in between you and the server is not listening on the correct port. You might also see:
$ telnet 192.168.0.237
Trying 192.168.0.237...
telnet: Unable to connect to remote host: No route to host
Which means the server doesn't exist at the address you thought it did, or there's a network routing problem in between.
You should first test this out from outside the network the server is on, preferably somewhere several ISPs disconnected. Then try it from the local network. Then try it from the local machine, using "localhost" in place of the hostname, assuming your web server is set to listen to loopback connections.
Once you know the pattern of the failures, then you can start trying to figure out where the failure is happening. My gut instinct is that your nginx or FastCGI is the root of the problem rather than some intermittent network problem that doesn't affect SSH traffic, but it's not really possible to troubleshoot further without first addressing the network question.
Hope this gives you some ideas of what to start with next time. Good luck.
Update
I just noticed your side question re the best way to "consume" log files. If you are in the middle of troubleshooting the problem, I recommend using tail
. Open up two ssh sessions on the server, and in one tail -f /var/log/nginx/access_log
and in the other tail -f /var/log/nginx/error_log
(or whatever the paths are on your system).
If you need to dig through a dense log file after the fact, a good tool to start with is less
. Just run less /var/log/nginx/error_log
, and then press space to page down, b
to page up, /
to initiate a search, after which n
will find the next search result and N
will find the previous result, and use q
to exit back to the shell.
I would guess there are better tools specific to particular types of logs, but tail
and less
usually get me about 90% of what I need when troubleshooting my logs.