Spamassassin, sa-learn with subdirectories

Question

I have a problem to figure out a good solution to set Spamassassing (sa-learn) for learn by e-mails from subdirs.

I read a lot of tutorials. In every tutorial is something like this:

/usr/bin/sa-learn --no-sync --ham /var/vmail/*/*/Maildir/{cur}
/usr/bin/sa-learn --no-sync --spam /var/vmail/*/*/Maildir/.Junk/{cur,new}

Thats all.

I just tried my own way like this:

/usr/bin/sa-learn --no-sync --ham /var/vmail/*/*/Maildir/.*/{cur,new}
/usr/bin/sa-learn --no-sync --ham /var/vmail/*/*/Maildir/{cur}
/usr/bin/sa-learn --no-sync --spam /var/vmail/*/*/Maildir/.Junk/{cur,new}
/usr/bin/sa-learn --forget /var/vmail/*/*/Maildir/.Trash/{cur,new}
/usr/bin/sa-learn --sync

First, I say all directories, created by an user, to ham this directories. After that, I say to spamassassing that .Junk is the spam email location. As third step, forget the Trash. And sync.

An example of an email account:

.Draft
.Junk
.Trash
.Important
.Important.Others
.Important.Others.Others
cur
new

Thaank you for help!!

best regards

My way is working. But is this the right way? Because first the script marks all e-mails as ham (also e-mails thats already marked as spam in the run before). After that the script goes to mark the emails in the .Junk dir as spam. And at the end, I havn't a powerful statistc. (Because of the removed mark as spam and new added spam mark) I thought there is a better way — Sanny F., Dec 05 '16 at 13:05

score 1 · Answer 1 · answered Mar 18 '17 at 08:00

You should choose whether you want to learn on new folders as well. I don't, because these folders may contain false positives and negatives. When email is in my cur folder, I have read it and placed it in the right folder (ham or spam).
Why do you forget email in the trash? It depends on how you use your trash. In my workflow, only ham ends up in the trash. So you may as well use that data to learn the classifier.
Using --no-sync on the --forget call may speed up.

score -1 · Answer 2 · answered Dec 19 '16 at 14:15

-1

Is it necessary to learn all the emails as HAM ? You consider that there are no spam on this folders ? With your first commands, you risk to consider spam as ham no ?

Why not only learn SPAM from Junk folder ?

answered Dec 19 '16 at 14:15

Nic0

102
5

Yes, you need to learn ham as well. Otherwise, SpamAssassin will not know what ham looks like, so in the end it will classify everything as spam and think it has built a 1.0 score classifier. But, he shouldn't be training on the `new` folders. See my answer. – Mar 18 '17 at 08:01

Spamassassin, sa-learn with subdirectories

2 Answers2