My solution to this was to configure Dovecot's built-in antispam/mailtrain feature to pass messages to a script as spam/ham when they are transferred to/from my junk mailboxes respectively, so that they can be learned using a cron-job. While it's possible to pass the messages to sa-learn
directly this could mean learning accidental mis-filings, plus it's much slower than just dumping the file for later. This is also likely to work only when using a global spamassassin bayesian database, i.e- if your e-mail users are virtual rather than added as unix user accounts.
First of all you'll want to create the mail-training script, I created mine at /etc/dovecot/dovecot-mailtrain.sh
for convenience, with appropriate permissions so that dovecot can execute it:
#!/bin/bash
root_dir='/var/lib/mailtrain'
# Determine which are the right and wrong directories
[ "$1" = 'ham' ] && { add='ham'; remove='spam'; } || { add='spam'; remove='ham'; }
# Generate a unique ID for the message while saving to tmp
trap '[ -e "$root_dir/tmp/$$" ] && rm -f "$root_dir/tmp/$$" 2>/dev/null' INT HUP TERM EXIT
sha=$(cat | tee "$root_dir/tmp/$$" | shasum -a 256 | awk '{print $1}')
# Remove file if it already exists in the wrong folder
[ -e "$root_dir/$remove/$sha" ] && rm "$root_dir/$remove/$sha"
# Move tmp file into correct folder
mv "$root_dir/tmp/$$" "$root_dir/$add/$sha"
exit 0
Note: I'm generating unique filenames using shasums
because I found I couldn't rely on messages having been given a unique message ID at this point.
You'll need to create the /var/lib/mailtrain
directory and make it accessible to dovecot, then create three sub-directories for spam
, ham
and tmp
that dovecot can write to.
Next is to configure dovecot. To do this I decided to create a new file under /etc/dovecot/conf.d/90-antispam.conf
as follows:
### Dovecot Anti-Spam ###
# Automatically sends spam to sa-learn to parse as --spam or --ham
# if they are moved to or from the Spam mailbox respectively
plugin {
antispam_backend = pipe
antispam_pipe_program = /etc/dovecot/dovecot-mailtrain.sh
antispam_pipe_program_spam_arg = spam
antispam_pipe_program_notspam_arg = ham
antispam_pipe_tmpdir = /tmp
# Mailboxes to respond to
antispam_spam = Spam;Junk
antispam_trash = Deleted Messages;Trash
#antispam_unsure = Virus
}
Unfortunately this seems to operate by mailbox name only, so if a user creates a mailbox with a name that isn't recognised as spam or trash above, then it may not be treated correctly, even if it is designated for spam/trash use.
After a service dovecot reload
messages moved to a spam folder will now appear under /var/lib/mailtrain/spam
and messages moved out of a spam folder will appear under /var/lib/mailtrain/ham
, the script will ensure that messages don't appear under both folders. The last step therefore is to create a script for actually importing these messages as spam/ham:
#!/bin/bash
root_dir='/var/lib/mailtrain'
sa-learn --no-sync --spam "$root_dir/spam" && find "$root_dir/spam" -mindepth 1 -delete
sa-learn --no-sync --ham "$root_dir/ham" && find "$root_dir/ham" -mindepth 1 -delete
sa-learn --sync
This clears each folder after its contents have been imported, then runs a single sync operation after both are imported, rather than syncing twice. Store this script somewhere suitable for running as a cronjob then schedule it with crontab -e
. You can do this as root, but ideally should give the cronjob to another user, but they will need to have access to both /var/lib/mailtrain
(and write access to its sub-directories) as well as being a member of the spamd
or debian-spamd
group (whichever group owns /var/lib/spamassassin
. I did this by adding dovecot to the spamd
group with usermod -a -G spamd dovecot
then giving it the cronjob via cronjob -u dovecot -e
.
With this setup spamassassin will automatically learn spam/ham based upon what users do with it, however, if it hasn't been trained before you will still need to give it some initial messages to learn. Fortunately this can now be done easily using any suitable mail client; import a bunch of ham messages into a temporary mailbox, move them into the spam mailbox, then move them back out of it. Then take a bunch of spam, import to the temporary mailbox, and move them into the spam mailbox. You should now have a bunch of messages under /var/lib/mailtrain/spam
and /var/lib/mailtrain/ham
, once sa-learn
has imported at least two hundred of each spamassassin will be ready to begin adding spam headers to your messages.