1

I have a mailfilter system based on that described at: http://www200.pair.com/mecham/spam/ (Debian Lenny, postfix / amavisd-new / spamassassin / policy-weightd, etc.)

This system has been running flawless for us for the past few years (on Etch first, then re-built on Lenny after it was released)

Then in the past week, the amavisd-new process has been locking up in such a way that:

  • there are still amavid processes running - all marked with "(accept)" as opposed to "(avail)"
  • I can still make a telnet connection to the amavis port, but it sits there connected with no response
  • running amavisd-nanny locks up my ssh session and I have to abort and start a new ssh session.
  • Curious log entries "amavis[25474]: (25474-20) Requesting process rundown after 20 tasks" seem to occur before a process gets frozen in the "(accept)" state.
  • This has been happening on both of our (nearly identical) mailfilters, starting at the same time (about the time of the below mentioned libaprutil1 upgrade)

I haven't been able to discover much on my own, and am wondering if anyone here is facing the same thing?

Can anybody point me in the right direction on this?

Brent
  • 22,857
  • 19
  • 70
  • 102

2 Answers2

0

The "newer" versions of amavisd-new will wedge over time. I encounter this about every 5-6 months. I haven't traced the issue directly but it appears to be an issue with the perl installation that comes with RHEL/CentOS (which is what I am running).

I can tell you that you'll want to adjust the child lifetime down quite a bit, say 10-30 runs each, as this seems to mitigate the worst of it. It also seems that spam storms can cause the process a bit of grief and under heavy load, causes some children to die.

I wish I had more I could tell you, but that's what I have. Oh, and are you updating modules out of CPAN? On Debian I believe this works correctly, but on RHEL/CentOS this has been documented for some time to cause all kinds of grief. If all else fails, rev the newest version of spamassassin in CPAN, which in the past has also mitigated some issues.

UPDATE:

The official page mentions several issues, including some incompatibilities between Net::Server and newer versions. As I don't have any version numbers to reference, I suggest you (a) get the version of amavis you're using from rpm -q (b) use CPAN to determine what version of Net::Server you're running.

Avery Payne
  • 14,536
  • 1
  • 51
  • 88
  • Thank you, I will try adjusting the child lifetime and see if it helps. I also read somewhere about a related issue that increasing sa_timeout could help. – Brent Jun 22 '09 at 13:17
  • Thanks alot - I have added the version numbers in a comment to the original question, but will check out the link above. – Brent Jun 23 '09 at 21:35
  • Looks like my versions are considerably newer than those mentioned in your link. – Brent Jun 23 '09 at 21:38
0

I think I found the solution:

Essentially, the pyzor server IP addresses has changed. This can be confirmed by running pyzor ping and su amavis -c 'pyzor ping' and getting timeouts for each.

This can be resolved this by running pyzor discover; su amavis -c 'pyzor discover', and setting up a regular cron job to run this command (in case it changes again in the future)

Ever since I made this change, amavisd has stopped "jamming" on me.

Brent
  • 22,857
  • 19
  • 70
  • 102