Spamassassin and Dovecot mdbox Compatibility

Question

So I'm thinking of using Dovecot's mdbox format for storing mail, however I'll also be using Spamassassin and need to be able to pass it a folder of e-mails for its filters to be applied to.

Can this be done from the mdbox format directly? If not, is there some way that I can extract some or all of the contents of an mdbox mailbox in a Spamassassin friendly way? If so, is it possible to pipe it into spamassassin (rather than having to extract into a folder)?

Haravikk · Accepted Answer · 2023-04-15T09:53:51.387

My solution to this was to configure Dovecot's built-in antispam/mailtrain feature to pass messages to a script as spam/ham when they are transferred to/from my junk mailboxes respectively, so that they can be learned using a cron-job. While it's possible to pass the messages to sa-learn directly this could mean learning accidental mis-filings, plus it's much slower than just dumping the file for later. This is also likely to work only when using a global spamassassin bayesian database, i.e- if your e-mail users are virtual rather than added as unix user accounts.

First of all you'll want to create the mail-training script, I created mine at /etc/dovecot/dovecot-mailtrain.sh for convenience, with appropriate permissions so that dovecot can execute it:

#!/bin/bash
root_dir='/var/lib/mailtrain'

# Determine which are the right and wrong directories
[ "$1" = 'ham' ] && { add='ham'; remove='spam'; } || { add='spam'; remove='ham'; }

# Generate a unique ID for the message while saving to tmp
trap '[ -e "$root_dir/tmp/$$" ] && rm -f "$root_dir/tmp/$$" 2>/dev/null' INT HUP TERM EXIT
sha=$(cat | tee "$root_dir/tmp/$$" | shasum -a 256 | awk '{print $1}')

# Remove file if it already exists in the wrong folder
[ -e "$root_dir/$remove/$sha" ] && rm "$root_dir/$remove/$sha"

# Move tmp file into correct folder
mv "$root_dir/tmp/$$" "$root_dir/$add/$sha"
exit 0

Note: I'm generating unique filenames using shasums because I found I couldn't rely on messages having been given a unique message ID at this point.

You'll need to create the /var/lib/mailtrain directory and make it accessible to dovecot, then create three sub-directories for spam, ham and tmp that dovecot can write to.

Next is to configure dovecot. To do this I decided to create a new file under /etc/dovecot/conf.d/90-antispam.conf as follows:

### Dovecot Anti-Spam ###
# Automatically sends spam to sa-learn to parse as --spam or --ham
# if they are moved to or from the Spam mailbox respectively

plugin {
    antispam_backend = pipe
    antispam_pipe_program = /etc/dovecot/dovecot-mailtrain.sh
    antispam_pipe_program_spam_arg = spam
    antispam_pipe_program_notspam_arg = ham
    antispam_pipe_tmpdir = /tmp

    # Mailboxes to respond to
    antispam_spam = Spam;Junk
    antispam_trash = Deleted Messages;Trash
    #antispam_unsure = Virus
}

Unfortunately this seems to operate by mailbox name only, so if a user creates a mailbox with a name that isn't recognised as spam or trash above, then it may not be treated correctly, even if it is designated for spam/trash use.

After a service dovecot reload messages moved to a spam folder will now appear under /var/lib/mailtrain/spam and messages moved out of a spam folder will appear under /var/lib/mailtrain/ham, the script will ensure that messages don't appear under both folders. The last step therefore is to create a script for actually importing these messages as spam/ham:

#!/bin/bash
root_dir='/var/lib/mailtrain'

sa-learn --no-sync --spam "$root_dir/spam" && find "$root_dir/spam" -mindepth 1 -delete
sa-learn --no-sync --ham "$root_dir/ham" && find "$root_dir/ham" -mindepth 1 -delete
sa-learn --sync

This clears each folder after its contents have been imported, then runs a single sync operation after both are imported, rather than syncing twice. Store this script somewhere suitable for running as a cronjob then schedule it with crontab -e. You can do this as root, but ideally should give the cronjob to another user, but they will need to have access to both /var/lib/mailtrain (and write access to its sub-directories) as well as being a member of the spamd or debian-spamd group (whichever group owns /var/lib/spamassassin. I did this by adding dovecot to the spamd group with usermod -a -G spamd dovecot then giving it the cronjob via cronjob -u dovecot -e.

With this setup spamassassin will automatically learn spam/ham based upon what users do with it, however, if it hasn't been trained before you will still need to give it some initial messages to learn. Fortunately this can now be done easily using any suitable mail client; import a bunch of ham messages into a temporary mailbox, move them into the spam mailbox, then move them back out of it. Then take a bunch of spam, import to the temporary mailbox, and move them into the spam mailbox. You should now have a bunch of messages under /var/lib/mailtrain/spam and /var/lib/mailtrain/ham, once sa-learn has imported at least two hundred of each spamassassin will be ready to begin adding spam headers to your messages.

Spamassassin and Dovecot mdbox Compatibility

1 Answers1