1

On a linux box running postfix+amavis+spamassassin, we are thinking of implementing bayes filtering. This system already does spam filtering (without bayes) at the moment for multiple customer domains.

The question is, how should training be done in this scenario? Would we need to collect spam and ham from each client or would just one do and have a global database?

Thanks.

  • The bayesian filtering is global. But, SA also learns on its own. Unless you are prepared to provide it with a bunch of specifically ham and spam message to learn from - just let it build it's own database. It uses the other metrics to determine what is spam and ham and improves bayes filtering over time. I forget the exact numbers but I remember 100 messages either being classified or having to be seen before bayes will turn on. So, it won't start filtering right away until it learns a little bit. It only learns messages that are already very spammy, or hammy due to other metrics. – Appleoddity Oct 14 '17 at 05:31

1 Answers1

0

Bayes database is global per each SA configuration. You can setup it's location via bayes_path option in local.cf configuration file. Check for more detail here: https://wiki.apache.org/spamassassin/SiteWideBayesSetup

You can perform initial training of the database with your sets of ham and spam messages or wait for SA to learn from messages being received from postfix.

You may want to set different SA configuration files for different domains if average message content for these domain is too different and there are too many incoming messages with borderline content which should be marked as spam for users of one domain and as ham for users of another domain.