1

I am hoping to write a SpamAssassin rule that will match if an e-mail contains over n of a specific character in the body of the message. Not necessarily continuous. So for example, if there were ten total asterisks in a message, then match the rule.

I know how to write a rule if there are ten continuous asterisks, like so:

body TEN_ASTERISKS /(**********)/
score TEN_ASTERISKS 0.5
describe TEN_ASTERISKS Message body contains ten asterisks.

How would I re-write the rule to match on ten total asterisks, not necessarily in a row?

dub stylee
  • 113
  • 2
  • 8

1 Answers1

5

The above rule is malformed and will not work.

SpamAssassin rules are regular expressions, so you merely need to escape those:

body TEN_ASTERISKS /\*\*\*\*\*\*\*\*\*\*/
score TEN_ASTERISKS 0.5
describe TEN_ASTERISKS Message body contains ten asterisks.

Also, the parentheses are unnecessary. They form a capture group that is stored in memory, which (assuming you see a lot of asterisk-ridden mail) could take a toll on your server.

(You can also do body TEN_ASTERISKS /\*{10}/ but I wanted to make a simpler example. Regular expressions are very powerful.)


To match ten separate asterisks, you need tflags multiple:

body     __HAS_ASTERISK  /\*/
tflags   __HAS_ASTERISK  multiple maxhits=11
meta     TEN_ASTERISKS   __HAS_ASTERISK > 10
score    TEN_ASTERISKS   0.5
describe TEN_ASTERISKS   Message body contains ten asterisks.

The maxhits=11 piece is an optimization. It tells SpamAssassin to stop counting (tflags multiple is expensive to run).

Adam Katz
  • 951
  • 8
  • 17
  • I was not aware of `tflags`, that is exactly what I was looking for! One step closer to eliminating unsolicited junk spam... as far as the parenthesis go, would they be redundant if used like `/(\*)/`?? I typically have some parenthesis in my basic rules that are simply looking for certain keywords, like Viagra, etc. that are all grouped together. For example `/(viagra|cialis)/i`, would that be equivalent to `/viagra|cialis/i` in that case? – dub stylee Jul 07 '15 at 05:26
  • `/(\*)/` would capture the asterisk for future use, which you do not have, so it does nothing. `/(A|B)/i` is identical to `/A|B/i` although `/(A|B)C/i` differs from `/A|BC/i` in that the latter does not match "AC". `/(?:A|B)C/i` is a non-capturing regex that is slightly more efficient. `/(A|B)C\1/i` would match "aCA" but not "ACB". This is all basic regular expression stuff, not SpamAssassin-specific, so perhaps try a [regex tutorial](http://www.regular-expressions.info/quickstart.html) to start. – Adam Katz Jul 07 '15 at 16:34
  • Great, thank you for the clarification, as well as the additional resource. I will check it out, regular expressions have always been difficult for me to fully comprehend! – dub stylee Jul 07 '15 at 18:03
  • [Regex 101](https://regex101.com/) is a nice live experimentation resource as well. Its developer is great and has taken lots of feedback and the interface is really slick. – Adam Katz Jul 07 '15 at 18:24
  • I had bookmarked regexpal.com before for testing regular expressions, but it is not that great. Regex 101 looks like it will be a really good tool for testing new regular expressions before I add them to spam assassin :) thanks again! – dub stylee Jul 07 '15 at 18:52