2

I have a webmail system and for some time now have noticed that emails originating from a few servers come with an extra line break in the headers.

First was with the occasional Dmarc report from Google. About one a week. Then other automated emails from Bluebottle. Other occasional come from various senders.

The extra line in the one from Google and Bluebottle it is always in the same place. For the rest it's not. That is the problem. I can run a nice regex to fix the ones that come in the same place but for the others I wouldn't like to tempt faith and cause more harm then good.

I have noticed major providers usually just don't bother with this, but I'd like to bother.

I have built this regex: (\r[a-z-]*:.*)+(\r\r)+([a-z-]*:.*\r)+ So far it seems to work, but I feel a bit concerned it might cause me problems.

Since regular expressions this broad are not recommended I would like some opinions if anyone has encountered this issue.

transilvlad
  • 13,974
  • 13
  • 45
  • 80
  • Your concern is legit. Why do you bother? Is this extra carriage return causing problems? And how are you using or do you intend to use this regex? – Lodewijk Bogaards May 25 '13 at 16:40
  • Well before I use this one I would like some feed-back if there is a better solution. This issue issue is causing problem with my automated Bounce, Unsubscribe and Dmarc processor since the break is in the automated emails not regular emails. – transilvlad May 25 '13 at 21:53
  • 1
    the rfc 822 says that there should be one CRLF after each row, and one CRLF before the message-body, so if you find 2 CRLF the rest should be the body, and if its header there, the sending software dons't folow rfc 822, in your regexp you have only CR (\r) witout LF (\n), and if that works, where did all your \n go? – Puggan Se May 31 '13 at 18:40
  • Ah, sorry, I used \r only due to the limitation in this nice tool http://gskinner.com/RegExr/ – transilvlad May 31 '13 at 19:05

1 Answers1

0

After a month of testing.

This appears to work quite well without issues so far.

$data = preg_replace("/(\r\n)([a-z-]*)(:)(.*)(\r\n)(\r\n)([a-z-]*):(.*)(\r\n)/i", "$1$2$3$4$6$7$8$9", $data);
transilvlad
  • 13,974
  • 13
  • 45
  • 80