0

A Linux server I administer has been experiencing serious performance degradations as of late, becoming particularly apparent several weeks after each time it is rebooted. In particular Jenkins jobs allocated to the machine begin timing out several weeks after reboot, at which point all execution is sluggish, for example ssh requiring 10-15 seconds to connect to the machine.

The output of the uname command is

root@_____:~# uname -a
Linux _____ 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Looking at vmstat and free it appears almost all of the machine's physical memory is in use with little being used for caching and little swap in use.

root@_____:~# free -m
         total       used       free     shared    buffers     cached
Mem:          3865       3686        179          0         12        282
-/+ buffers/cache:       3391        473
Swap:         4102        504       3597

root@_____:~# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
0  2 953288 217960   6344 232268   41   35   111   106    0    0 42  3 52  3  0

top reports however that even the system's greatest consumers of memory are only using in the single digit percentages of available memory

root@moose:~# ps aux --sort -rss
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
regress+  6766 13.1  6.1 1894448 245188 ?      Sl   10:10   5:25 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java _____
regress+ 22796  1.1  3.1 2552832 126600 ?      Sl   Apr07  11:17 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java _____
regress+ 17199  0.6  0.5 188952 23560 ?        Dl   10:51   0:00 _____
regress+ 23497  0.5  0.5 3057724 21764 ?       Sl   06:26   1:29 java _____

My theory at the moment is that some sort of resource leak has locked up large chunks of physical memory, causing the system to thrash and slow down. Is this the likely cause of the performance degretation? If so, how would I best go about resolving it, and if not what is another likely cause of the problem?

  • If your system supports it, try the htop command to see what is bogging it down. – SDsolar Apr 08 '17 at 00:54
  • `htop` showed the same as `ps aux`, that processes consuming the most resources still didn't use more than 10-20% CPU and 5-10% memory. – Benjamin George Roberts Apr 08 '17 at 00:57
  • if you have 200 items using 0.5% of memory that is 100% of the available memory, so exactly how many processes using 0.5% of memory exist? what about swap statistics? if you are using swap frequently you probably need more memory. Is this a physical system or virtual server? – Matt Apr 08 '17 at 23:10
  • You should not just look at the systems information when the system is getting sluggish; you should look at them regularly to see how they change over time. In other words, set up regular systems monitoring. – Jenny D Apr 12 '17 at 12:43

1 Answers1

0

Are there any non-standard kernel modules loaded? Kernel allocations may be taking up memory.

hayalci
  • 3,631
  • 3
  • 27
  • 37