-2

We are getting a lot of spam in the form femlmcte@tnbt.com.

I can see cPanel Spam assassin allows a regex match to be used to filter mail.

I want to write a regex expression that identifies mail in the form

anylengthofstrring@abcd.com  

or

someotherlength@efgh.co.za

i.e ONLY four letters after the @, but before the .com, .co.uk, .co.za etc

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • And, what have you tried yourself so far? =) – J. Steen Sep 14 '12 at 10:09
  • What about all the non-spam that comes from `anylengthofstrring@abcd.com`? I wouldn't mark as spam **everything** that comes from four letter domains. – ZZ-bb Sep 14 '12 at 10:13
  • You don't have to; you can create a rule with a limited score, such as one, and hope that the problematic messages will have enough other spam markers to bring the score up above the threshold. This is fundamental to how SpamAssassin works. – tripleee Sep 14 '12 at 10:50

2 Answers2

0

Try the following:

[^@]+@[a-zA-Z]{4}\.(com|co\.uk|co\.za)

Explanation:

[^@]+@                # One or more characters except @ followed by @
[a-zA-Z]{4}           # Matches 4 characters in the range A-Z and a-z
\.(com|co\.uk|co\.za) # Matches .com, .co.uk, and .co.za

If you want to match any tld, replace (com|co\.uk|co\.za) with .* (I'm assuming that since you are receiving these e-mails, you already know they are valid, so no need to validate the TLD).

Also note that with your current criteria, you'd consider as spam something from admin@food.com, which may or may not be what you want.

João Silva
  • 89,303
  • 29
  • 152
  • 158
  • Consider the possibility of more than three top domains... =) – J. Steen Sep 14 '12 at 10:10
  • @J.Steen: Yes, but I think OP doesn't want to match those. I suppose he only considers these *three* domains as *spam*. – João Silva Sep 14 '12 at 10:11
  • OP did say "etc", so I think he means more than *just* those top domains. In any case, just consider it. – J. Steen Sep 14 '12 at 10:12
  • @J.Steen: You're right, I didn't see the `etc`. I've edited my answer to reflect that. – João Silva Sep 14 '12 at 10:15
  • @JoãoSilva - Thank you for taking the time to explain the syntax. J.Steen - I started with ^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$ as a standard to identify an e-mail format - and was getting my head aroumd that before trying here :) – Arthur Mild Sep 14 '12 at 10:33
0

So you want to match an at-sign followed by 4 non-dot characters, followed by a dot? This specification translates easily into a regexp:

@[^.]{4}\.
pmakholm
  • 1,488
  • 8
  • 23
  • Thank you. The plan is to see if we can filter these mails, then check the filter to catch any 'real' mails that happen to match. – Arthur Mild Sep 14 '12 at 10:30