0

I have absolutely no idea how did the server crashed, the only exceptional thing I found is the following graph from munin:

enter image description here

Please don't tell me I need more RAM, as you can see, before the incident occurred, everything was stable. I just don't get why suddenly the server crashed nor why the memory demand was suddenly so high.

Pierre.Vriens
  • 1,159
  • 34
  • 15
  • 19
TheOnly92
  • 115
  • 2
  • The logs for that day would be helpful. Unfortunately logrotate only keeps 5 rotations' worth of backups, and rotates each day by default. – Ignacio Vazquez-Abrams Aug 28 '10 at 06:31
  • Check the logs, what does log say? check your hard drives for any bad sectors (this could lead to crash)! I see that committed memory (allotted to applications) is through the roof. `Committed memory is, essentially, all the memory which has been allocated by applications`. You may want to increase the swap space. see this: http://www.linuxquestions.org/questions/linux-server-73/committed-memory-keep-on-increasing-828116/ – pbu Mar 07 '16 at 15:33

2 Answers2

0

First check dmesg and system logs for any kernel panel or memory use. Looks like you have an application that is using all your memory. Try this script that will log your process list in a file and you will know what caused you the problem:

#!/bin/bash
mkdir /tmp/mem_log
while [ 1 ] ; do
   date "+%Y-%m-%d %H:%M:%S"
   ps aux
   sleep 60
done

and execute it like this:

nohup ./mem_log.sh > /tmp/mem_log/mem_log.log &

After the next server crash, check the log to see what process used all your memory. It is a memory problem, but not because you don't have enough memory, it's just a faulty process that causes this.

MihaiM
  • 728
  • 1
  • 9
  • 17
  • Will try that. Thanks. Just one question, if I save the log file in the /tmp/ directory, wouldn't it be deleted if I reboot the server (after it crashed)? – TheOnly92 Aug 28 '10 at 09:59
  • yes, sorry, you are right, save it in your home dir. – MihaiM Aug 28 '10 at 11:56
0

You may want to install psmon and make it report/kill misbehaving memory-hungry processes. Psmon logs / emails about events it reacts to, so that way you can easily find out what is the rebel process you have there.

Janne Pikkarainen
  • 31,852
  • 4
  • 58
  • 81