Suggested mechanisms for user-driven spam training?

Question

I'm looking at having a way for my mail users to completely manage their own spam training. Before I get into it, my mail server details:

Debian 7.5, postfix 2.9.6, dovecot 2.1.7, amavisd-new 2.7.1, spamassassin 3.3.2

So, each of my users in each domain has a Junk folder (/var/vmail/domain/user/.Junk) where they can put spam that doesn't get flagged as such. Then I have this script in place:

/etc/cron.daily/learnspam

#!/bin/sh

find /var/vmail -name .Junk -exec echo Examining {}... \; -exec sa-learn --dbpath=/var/lib/amavis/.spamassassin --spam {}/cur \;

I also have a folder that each user has called False Positives where they can drag messages into that are erroneously marked as spam, and I have a daily script for that too, which learns it as ham and moves it back to their inbox.

/etc/cron.daily/falsepos

#!/bin/sh

doveadm search -A mailbox 'False Positives' 2>/dev/null | while read user guid uid; do
    doveadm fetch -u $user text mailbox-guid $guid uid $uid > /tmp/$guid-$uid.eml
    doveadm move -u $user INBOX mailbox-guid $guid uid $uid
done

sa-learn --dbpath=/var/lib/amavis/.spamassassin --ham /tmp/*-*.eml
if ls /tmp/*-*.eml >/dev/null 2>&1; then
    rm /tmp/*-*.eml
fi

My question is, am I doing this correctly? Is there a better way? Does sa-learn work properly with amavis? I figure as long as I'm using the --dbpath=/var/lib/amavis/.spamassassin option, it should work fine.

People *ain't got time for this* in 2014!! Users expect a hands-off experience when it comes to spam filtering! - I drop a Barracuda Spam Filter (which uses Barracuda's RBL and SpamAssassin) into my customers' sites and train 200 HAM/200 SPAM messages and let it go. — ewwhite, Jul 22 '14 at 20:56
Then people can continue to lose emails they think they should have got, and receive emails they would rather not have. How anyone can expect a machine to **know** what emails they do and don't like, in 2014 or any other year that doesn't have telepathy circuits, is beyond me. — MadHatter, Jul 23 '14 at 09:23

score 3 · Accepted Answer · answered Jul 23 '14 at 07:13

3

You might want to take a look at dspam. It integrates with Dovecot and does basically exactly what you want, but on the fly, as the move operations happen (moving into Junk => spam, moving out of Junk => false positive).

answered Jul 23 '14 at 07:13

moenoel

146
1

Does dspam replace SpamAssassin? – CaptSaltyJack Jul 24 '14 at 01:09
They can work in parallel, afaik. – moenoel Jul 24 '14 at 07:46
@CaptSaltyJack dspam replaces some of what spamassassin does. By moving other parts like RBL lists to your postfix (or other MTA) you can construct a mail server which can handle reasonably high volume while still filtering incoming mail before queueing it, so you can reject mail without having to bounce it. spamassassin is unfortunately very resource intensive. – mc0e Aug 20 '14 at 14:11

score 2 · Answer 2 · answered Jul 23 '14 at 08:40

Your approach looks fine; I do something similar.

Two remarks:

Using --dbpath is good, that prevents a common setup error where SA uses a DB in ~amavis and sa-learn writes to a different DB in ~root.
One design limitation regarding multi-user operation: SpamAssassin uses a single global Bayes DB -- not a DB per user.

score 0 · Answer 3 · answered Jul 26 '14 at 08:54

Dspam does Bayesian filtering better than spam assassin. Many other filtering mechanisms like RBL, greylisting and DNS validity checks can be configured from the MTA (e.g. postfix). In this approach, you only look at email content after the other tests have been passed, which makes the system much less resource hungry. You don't get the same weighted combination, but if set up well you can get a very good spam system which uses much less CPU, and RAM. Also the dovecot plugin is triggered by moving mail between folders, which is much nicer than having separate folders for training.

Suggested mechanisms for user-driven spam training?

/etc/cron.daily/learnspam

/etc/cron.daily/falsepos

3 Answers3