1

we have a mail service with these details:

    1-Centos 6.4
    2:Postfix 2.6.6
    3:roundcube 0.8 
    4:dovecot 2.0.9.7
    5:mysql-server 5.1.71

everything is ok but in peak usage time roundcube sleeped connections increases from 1 or 2 or 3 to 270 in less than 10 minutes and apache opened files (measured by lsof) increase from 4000 to 20000 in that peak time.

this is apache conf: (apache works in prefork mode)

PidFile run/httpd.pid
Timeout 60
KeepAlive On
MaxKeepAliveRequests 100
<IfModule prefork.c>
StartServers       8
MinSpareServers    5
MaxSpareServers   20
ServerLimit      256
MaxClients       256
MaxRequestsPerChild  4000
</IfModule>
TraceEnable off
LimitRequestLine 1024
LimitRequestFields 100
LimitRequestFieldsize 1024
LimitRequestBody 10241024

and here is mysql config:

secure_auth=1
local_infile=0
max_connections        = 600
max_allowed_packet    = 16M
key_buffer        =256M
wait_timeout=240
interactive_timeout=180
connect_timeout=10
innodb_buffer_pool_size=2G

when sleeped connections of roundcube increased to >100 ,almost services (web,mail,mysql) go down....

thanks for any suggestion.

  • 1. How did you install your php? 2. How much your RAM/CPU? `free -m;cat /proc/cpuinfo;` – PersianGulf Apr 10 '14 at 09:09
  • You have decide use prefork or apache worker , so read the following link : http://codebucket.co.in/apache-prefork-or-worker/ – PersianGulf Apr 10 '14 at 09:13
  • mysql down or apache ? or all of your machine ....? – PersianGulf Apr 10 '14 at 09:16
  • install php from repository (php 5.2.10) 256G RAM 64 Cores CPU 2.4GHz – Sassan torabkheslat Apr 10 '14 at 12:31
  • in peak time all services go down – Sassan torabkheslat Apr 10 '14 at 12:32
  • @MohsenPahlevanzadeh aziz! I know what is worker and prefork mode and we decided to use prefork everything is ok upon 3 months ago but now.... – Sassan torabkheslat Apr 10 '14 at 12:46
  • It's such as "Opened-Released connections", and you emulate a DOS attack(not syn, such as OpenDNS) , not you but only your kernel. i have a suggesstion: telnet from a secified machine and test your ip will be died or not. if die, be sure test your kernel and your boot process. – PersianGulf Apr 10 '14 at 15:15
  • Is it run under cluster management? Did you have any log in crash time? and paste ram and cpu situation in crash time – Behrad Irani Apr 11 '14 at 01:30
  • We use red hat cluster suite in failover mode but we freeze service for some reasons. server don't crash! but services like apache and mysql go down (CLI work too slow, because apache opened too many file descriptor like these: Wed Apr 9 13:30:18 IRDT 2014 22288 apache, 23 clam, 29 dbus, 175 dovecot, 3033 dovenull, 52 haldaemon, 93 luci, 1986 mail, 70 mysql, 29 ntp, 1342 postfix, 35 ricci, 6513 root, 1 USER, – Sassan torabkheslat Apr 11 '14 at 10:31

2 Answers2

0

The Answer is:

I have edit apache max_client option to lower value 256 --> 50 why!?

for (still) unknown issue all the preforked apache processes take CPU usage about 100% (100% usage of that core running preforked apache process for few moments)

So system go Down , because system has 64 CPU cores when all the 256 processes of apache use 100% cpu usage , system and services go down

issue still exist but services has no problem I think issue related to network attacks (our monitoring tools report many attacks per day), that sometimes make problem such as resource locking or something else

thank you for all suggestion.

0

Now

After about 5 years

The problem has been detected and solved in few days.

It was so complicated for a Jr. System administrator like me ;)

There was a problem in GFS2 cluster file-system that my teammate prepared on iSCSI LUN and this issue led to various issues and problems in Dovecot and roundcube (and then apache)

for your information , when I pay attention to %wa parameter on top command , (it was about 90% ) , I thought (perhaps) there is problem in filesystem level.

Then I had decide to transfer all data to new cluster filesystem (ocfs2) because GFS was deprecated !

First of all, All data moved to new cluster file-system (on ocf2) then re-design whole system based on pacemake haproxy on debian wheezy!