I have read similar questions here, but being that all regular expressions are not created equal, I was not able to find a solution to my problem.
I am working on a rule for SpamAssassin that will tell if the recipient's e-mail username is contained in the body of the message. For example, an e-mail sent to testuser@somedomain.com
contains testuser
in the body of the message. I have written and tested a regular expression on Regex-101 and am able to match it as expected, but when I create the rule it does not work when I test it in SpamAssassin.
Here is the expression:
/To:\s([a-z0-9][-a-z0-9]{1,19})\@somedomain\.com[a-z0-9\s=;:\/\.-]*\1\b/i
What is should do is match an e-mail address in the To:
header (or anywhere in the body of the message matching the format To: user@somedomain.com
. As I mentioned before, the expression matches as expected on Regex-101, but when I make a rule in SpamAssassin, it does not match.
If I remove the leading To:\s
then it does match, but I am only concerned with matching the e-mail in the To:
header. I have tried these various mutations of the expression:
/To:\s([a-z0-9][-a-z0-9]{1,19})\@somedomain\.com[a-z0-9\s=;:\/\.-]*\1\b/i
/To: ([a-z0-9][-a-z0-9]{1,19})\@somedomain\.com[a-z0-9\s=;:\/\.-]*\1\b/i
/To:[\s]{0,2}([a-z0-9][-a-z0-9]{1,19})\@somedomain\.com[a-z0-9\s=;:\/\.-]*\1\b/i
/:\s([a-z0-9][-a-z0-9]{1,19})\@somedomain\.com[a-z0-9\s=;:\/\.-]*\1\b/i
/\s([a-z0-9][-a-z0-9]{1,19})\@somedomain\.com[a-z0-9\s=;:\/\.-]*\1\b/i
None of the previous rules match, but this one does:
/([a-z0-9][-a-z0-9]{1,19})\@somedomain\.com[a-z0-9\s=;:\/\.-]*\1\b/i
Here is the text I am using for testing:
Subject: Test spam mail (GTUBE) private jet rental
Message-ID: <GTUBE1.1010101@example.net>
Date: Wed, 23 Jul 2003 23:30:00 +0200
From: Sender <sender@live.com>
To: recipient@somedomain.com
Precedence: junk
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
recipient
This is the GTUBE, the
Generic
Test for
Unsolicited
Bulk
Email
Which should match on the To: recipient@somedomain.com
.... recipient
, but I can only get it to match when I remove the To:\s
from the expression. The full expression tests out in Regex-101, so it seems to be something specific to SpamAssassin, but I'm not sure.
EDIT
Here is an updated version of the expression to NOT allow a dash at the end of the username, but will allow in the middle:
/\bTo:\s([a-z0-9][-a-z0-9]{0,18}[a-z0-9])\@somedomain\.com[a-z0-9\s=;:\/\.-]*\b\1\b/i