A Linux server I administer has been experiencing serious performance degradations as of late, becoming particularly apparent several weeks after each time it is rebooted. In particular Jenkins jobs allocated to the machine begin timing out several weeks after reboot, at which point all execution is sluggish, for example ssh requiring 10-15 seconds to connect to the machine.
The output of the uname
command is
root@_____:~# uname -a
Linux _____ 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Looking at vmstat
and free
it appears almost all of the machine's physical memory is in use with little being used for caching and little swap in use.
root@_____:~# free -m
total used free shared buffers cached
Mem: 3865 3686 179 0 12 282
-/+ buffers/cache: 3391 473
Swap: 4102 504 3597
root@_____:~# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 2 953288 217960 6344 232268 41 35 111 106 0 0 42 3 52 3 0
top
reports however that even the system's greatest consumers of memory are only using in the single digit percentages of available memory
root@moose:~# ps aux --sort -rss
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
regress+ 6766 13.1 6.1 1894448 245188 ? Sl 10:10 5:25 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java _____
regress+ 22796 1.1 3.1 2552832 126600 ? Sl Apr07 11:17 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java _____
regress+ 17199 0.6 0.5 188952 23560 ? Dl 10:51 0:00 _____
regress+ 23497 0.5 0.5 3057724 21764 ? Sl 06:26 1:29 java _____
My theory at the moment is that some sort of resource leak has locked up large chunks of physical memory, causing the system to thrash and slow down. Is this the likely cause of the performance degretation? If so, how would I best go about resolving it, and if not what is another likely cause of the problem?