0

I have a server that is running an older version of Centos (5). About once a day, for 4-5 minutes, sometimes longer, the server is not accessible. After a few minutes it becomes available again. It is very strange.

I check messages and secure and do not see anything happening at the times when it was inaccessible. For example, today I ssh'd in around 7:50 AM. While I was looking around the server, my terminal hung. So I closed terminal and tried to reconnect and could not. I tried to access (with a browser) the website running on that server and could not. Then, approximately 8:03, everything seemed back to normal.

My question is what logs, systems or files should I check to try to determine why this keeps happening. BTW, it often happens around the same time each day. But not exactly.

Thanks for any tips or pointers.

Doug Wolfgram
  • 135
  • 2
  • 7

1 Answers1

2

I just recently installed sar (System Activity Report) on a CentOS 5 machine so I could have some idea what was going on when the web server stopped responding to requests. I haven’t yet fully explored it (other than verifying that the cron job is recording the system information every 10 minutes) but here’s some basic info and pointers on it.

According to the sar Wikipedia article

sar (System Activity Report) is a Solaris-derived system monitor command used to report on various system loads, including CPU activity, memory/paging, device load, network.

In Linux distributions, it’s provided by the sysstat package.

Easy system monitoring with SAR from IBM has a good introduction to using sar.

This Softpanorama article is also written by Sean Walberg (same author as above).

See also:

Anthony Geoghegan
  • 2,875
  • 1
  • 24
  • 34
  • 1
    I recommended it to my sys admin. :) We did discover that there were dropouts in connectivity so we suspect a NIC issue. – Doug Wolfgram Feb 08 '16 at 18:54
  • It could still very easily be load related. Especially if it happens same time every day. Examine cron jobs. Nic issues should be corroborated with error counts from the switch, or shown in ifconfig, ethtool or netstat -s . Don't worry about non-zero values for attributes that sound bad. Worry if they are growing. You can also leave an mtr running on your desktop, against the server's IP while you're ssh'd in. the mtr would reveal if it's a routing problem that's causing your dropouts. – Billy left SE for Codidact Oct 22 '17 at 09:34