I have CentOS 5.3 based server with kernel 2.6.18-128.2.1.el5. It worked fine nearly for a month, but this week it went down three times. I saw it in Nagios, write a email to reboot the server. It worked 12-36 hours and then went down again.
I look through log files. Just before first fault in /var/log/messages
was this message:
logrotate: ALERT exited abnormally with [1]
After rebooting the server the second time the sysadmin from datacenter send me this screenshot:
alt text http://www.freeimagehosting.net/uploads/bd9fb68d98.png
Before the third fault in /var/log/messages
was message:
Eeek! page_mapcount(page) went negative (-1)
How should I investigate the problem?
UPD:
Part of the memtester
output:
Compare OR : FAILURE: 0x7e9f90d1 != 0x7e9fd2d1 at offset 0x06222609. FAILURE: 0x7e9f90d1 != 0x7e9fd0d1 at offset 0x06222621. FAILURE: 0x7e9f90d1 != 0x7e9fd1d1 at offset 0x06222661. FAILURE: 0x7e9f90d1 != 0x7e9f92d1 at offset 0x06222681. FAILURE: 0x7e9f90d1 != 0x7e9fd0d1 at offset 0x062226a1. FAILURE: 0x7e9f90d1 != 0x7e9fd0d1 at offset 0x062226c1. FAILURE: 0x7e9f90d1 != 0x7e9f93d1 at offset 0x062226e9.
It is faulty memory. Thank you for help!