My server is encountering high CPU load (like almost pegged at 100%) so much that the Apache service can't run and we get Apache 500 errors. We used a script to catch this and this is where we discovered that normally the server doesn't run a bunch of processes that look like "/usr/sbin/exim -Mc 1R6Nvz-0006CN-KI". However, when the problem occurs, consistently we find a bunch of processes in memory that say "/usr/sbin/exim -Mc 1R6Nvz-0006CN-KI". We contacted HostGator support and they said indeed that the cause of the problem is Exim Mail Retries (what the -Mc switch is for) and not Apache or MySQL or any other process. They agree with my conclusion on focusing purely on Exim.
HostGator is going to grant me root access today to this dedicated host. I'm brand new to Exim, but know Linux fairly well. What logs, email directories, and Exim config files would you recommend I look at in order to troubleshoot high Exim Mail Retries? Note that this is a CentOS 5 Linux with WHM/cPanel on it.
For instance, things I'd love to see:
- the log file on Exim activity, both success and error
- would like to crack open one of the emails it's trying to retry, in order to see a clue perhaps
- would love to see the Exim config files to see if there's a throttle we can apply so that we don't do all these Exim mail retries at once, but perhaps over a large period of time