Could someone give a tip on how to log first 5 rows from top output? I was thinking about grep, but don't know how to pick rows.
I need to understand. What freezes server sometimes. Maybe there are some tools for it?
Thanks ;)
You can do what you've requested by:
~$ top -n1 | head -5
In addition to Janne's list of commands and symcbean's advice to check your system logs, I would suggest atop:
...
It shows the occupation of the most critical hardware
resources (from a performance point of view) on system level, i.e. cpu,
memory, disk and network.
It also shows which processes are responsible for the indicated load
with respect to cpu- and memory load on process level. Disk load is
shown if per process "storage accounting" is active in the kernel or if
the kernel patch `cnt' has been installed. Network load is only shown
per process if the kernel patch `cnt' has been installed...
[...]
Every interval (default: 10 seconds) information is shown about the
resource occupation on system level (cpu, memory, disks and network
layers), followed by a list of processes which have been active during
the last interval (note that all processes that were unchanged during
the last interval are not shown, unless the key 'a' has been pressed).
If the list of active processes does not entirely fit on the screen,
only the top of the list is shown (sorted in order of activity).
Also with atop you can check backwards in time which was the situation of your server because it stores that data in it's log files. For example, I have the following snippet of code in a script that is launched when a server loadavg passes an arbitrary limit. The atop information and other related system info is sent then by mail to my account:
atop -r /var/log/atop.log -M -b "$(date +'%H:%M' -d '30 minutes ago')" -e "$(date -d now +'%H:%M')"
Basically I get a report of the actual state of the server and what was happening in it during the past 30 minutes (with detailed information of each 10 minutes interval)
First, its unlikely that the server freezing is due to user processes. Secondly, even if that were the case, why would it be one of the top 5 causing the problem?
If you've already checked your logs and found nothing, then try implementing a watchdog to write a heartbeat to the logs - and check that the system really is freezing (as opposed to being observed to pause when remotely accessed).
How long does the freeze last for? How frequent? Is there a blip in the load when it comes back? Where are you observing these freezes? Is this a dedicated or virtual machine?
top | head -n 12 >>your_file.txt
would do it, but with caveats: it would save all the control and ANSI characters as well, so browsing the output with less
etc would not be nice. And, to be honest, not nearly the best way to catch the rebel process.
For overall trends about your server performance tools like snmp+mrtg, Cacti or Munin can be very useful - they graph CPU usage, memory usage and so on. For command line use sysstat
package with its sadc
daemon and sar
reporting utility can also offer you a view about how's your server doing. psacct
package adds BSD accounting and can provide you overall statistics about per-user/process time spent.
For real-time view just stay logged on your server and follow stuff like iostat -x 1
, vmstat
and top
.
Consider setting up a separate syslog server If you have spare hardware (or a central syslog server already installed), use it! You just configure your server to send out its syslogs and other logs to your syslog server. Sometimes those freezes (if caused by a kernel panic) might happen so that your server cannot write information to disk but it's able to send its last words to the syslog server.
Test your server with memtest86 if you can take your server offline for a night or so, let it run memtest86
to see if it has faulty RAM or something else.
Sorry, I'm unable to be more helpful than this since you did not actually tell us much about your problem.