2

This worked in procmail, but it seems procmail was abandoned in Sept 2001. I had a rule that would sense when utf-8 was used in the 'To:' header to write my name using emoji or non-Latin characters. When I try the same in Dovecot's Sieve implementation "Pigeonhole", I am frustrated because it seems to discard some of the data.

ref. Sieve rules in RFC5228
ref. Dovecot Pigeonhole implementation

What I tried:

require ["fileinto"];
if header :contains ["to", "from"] "=?utf-8?B?" {   fileinto "Junk"; }
elsif address :contains :all ["to", "from"] "=?utf-8?B?" {   fileinto "Junk"; }

With this example data:

From: "=?utf-8?B?TWluaSBXdQ==?=" <mini@aonerivertech.com>
To: "=?utf-8?B?Q1VTVA==?=" <moses@example.com>
Subject: =?utf-8?B?UmU6TWljcm9jaGlwIFRleGFzIE9mZmVy?=
Date: Mon, 20 Mar 2023 16:12:50 +0900

Hello potential customer! Please stop whatever you're
doing and pay attention to me!

What I get:

sieve-test -Tlevel=matching -t - /tmp/badmail.sieve /tmp/badmail.txt

      ## Started executing script 'badmail'
   2: header test
   2:   starting `:contains' match with `i;ascii-casemap' comparator:
   2:   extracting `to' headers from message
   2:   matching value `"CUST" <moses@example.com>'
   2:     with key `=?utf-8?B?' => 0
   2:   extracting `from' headers from message
   2:   matching value `"Mini Wu" <mini@aonerivertech.com>'
   2:     with key `=?utf-8?B?' => 0
   2:   finishing match with result: not matched
   2: jump if result is false
   2:   jumping to line 3
   3: address test
   3:   starting `:contains' match with `i;ascii-casemap' comparator:
   3:   extracting `to' headers from message
   3:   parsing address header value `"=?utf-8?B?Q1VTVA==?=" <moses@example.com>'
   3:   address value `moses@example.com'
   3:   extracting `all' part from address <moses@example.com>
   3:   matching value `moses@example.com'
   3:     with key `=?utf-8?B?' => 0
   3:   extracting `from' headers from message
   3:   parsing address header value `"=?utf-8?B?TWluaSBXdQ==?=" <mini@aonerivertech.com>'
   3:   address value `mini@aonerivertech.com'
   3:   extracting `all' part from address <mini@aonerivertech.com>
   3:   matching value `mini@aonerivertech.com'
   3:     with key `=?utf-8?B?' => 0
   3:   finishing match with result: not matched
   3: jump if result is false
   3:   jumping to line 3
      ## Finished executing script 'badmail'

Implicit keep:  store message in folder: INBOX

It records the "=?utf-8?B?..." in the trace output, so I know it knows. But the 'header' test and the 'address' test both discard that data before executing. I also tried the :comparator "i;octet" instead of the default "i;ascii-casemap" with the same results.

How can I test the raw headers instead of these interpreted values?

  • 1
    What do you need the raw encoded form for? Why not just apply a regex on the decoded value? – anx Mar 20 '23 at 17:02
  • Ooof, I forgot how exacting people are here. Yes I could regex for [\x7f-\xff] but that doesn't get me what I previously enjoyed with postfix. postfix could tell the difference between "=?utf-8?B?TW9zZXM=" and "Moses", but as far as I can tell dovecot's sieve implementation cannot. Filtering on this substring was a useful tool for fighting spam and I hoped I wouldn't have to do without it. – Moses Moore Mar 24 '23 at 15:20

1 Answers1

1

So.. you are not actually looking to distinguish on "emoji or non-Latin characters", but instead the specifics of how‡ characters are transmitted on the wire?

I cannot think of a way to make Sieve go back to the raw bytes. You could work around by doing the matching in the mail server, e.g. using the Postfix (RFC2047-ignorant) header_checks feature to prepend a custom header, e.g.

# header_checks = pcre:/etc/postfix/maps/remember_header_encoding
#  pcre is case insensitive by default
/^To:.*=\?utf-8\?B\?/   PREPEND X-Preserve-For-Sieve: RFC2047 marker in header To:

And then check for the existence of such marker headers in sieve.


Even if it was today, I doubt the whole thing will be reliable sorting criteria for the foreseeable future. A relaying SMTP server, up to and including the one passing to sieve might add encoding where there previously was none as part of message transformations. Some mail clients will add encoding where none is needed, others will fail to do so even though they should. Detecting a difference where none was intended is probably not going to statically affect the same sorts of messages.


‡ a choice other than superfluous encoding is rare with regular mail - Dovecot does not yet guarantee 8-bit-clean transports such as SMTPUTF8

anx
  • 8,963
  • 5
  • 24
  • 48