I have been working on getting SpamAssassin up and running for awhile now and am pretty close to being finished. However, there is one last thing that is grinding away at me that I can't seem to figure out. I have searched around a bit but have been unable to find an answer that I find to be conclusive, so I just want a little clarity so I can sleep better at night.
I have read that SpamAssassin needs at least 200 messages, preferably 1000 to do an effective job of Bayesian filtering. I have been feeding it spam (at least I think) by issuing the following command:
sa-learn --showdots --mbox --spam spamfolder
As far as I can tell it is being processed by SpamAssassin. So I run:
sa-learn --dump magic
and get the following output:
bruticus@bruticus:~$ sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 306 0 non-token data: nspam
0.000 0 210 0 non-token data: nham
0.000 0 68430 0 non-token data: ntokens
0.000 0 1318421928 0 non-token data: oldest atime
0.000 0 1319141693 0 non-token data: newest atime
0.000 0 1319142287 0 non-token data: last journal sync atime
0.000 0 1319142287 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire atime delta
0.000 0 0 0 non-token data: last expire reduction count
Are the items in the nspam and nham column indicative of the actual amount of learning and messages that SpamAssassin is using for its Bayesian analysis?
Do I need to get these two sets of numbers up into the 1,000's to get SpamAssassin to really start doing its job or how do I know when I have fed it enough spam to start working correctly?