2

When my mail setup detects that a mail is spam, it puts *SPAM* in the subject. Now I want to improve my bayes filter by training it on my corpus of spam.

If I feed these thousands of mails to sa-learn, will that work even if they still have the *SPAM* in the subject? Or will it have the effect of telling the filter “something is only spam if it has *SPAM* in the header”, which would be counter-productive?

Joachim Breitner
  • 3,779
  • 3
  • 18
  • 21

1 Answers1

2

According to the man page for sa-learn, this will be okay.

If the messages you are learning from have already been filtered through SpamAssassin, the learner will compensate for this. In effect, it learns what each message would look like if you had run spamassassin -d over it in advance.

miken32
  • 942
  • 1
  • 13
  • 35