7

I'm having problems sending HTML emails with long lines of text. The WYSIWYG editor (FCKEditor 2.5) used on the site keeps removing all the \n characters on certain browsers, including IE and Chrome. The result is an email with a single huge line of text. This wouldn't be a problem if it wasn't for email clients that wrap lines of over 998 characters by inserting ! \n in it. Of course, these almost always end up in the most unfortunate places, breaking HTML tags and looking nasty in the content itself.

My initial solution was to add a line feed after every HTML tag or every 900 to 990 characters. This is the regex I ended up with:

 return preg_replace("/(<\/[^\>]+>|<[^\>]+\/>|>[^<]{900,990}\s)(\n)*/","$1\n",$str);

However, when there are lines that don't contain any tags at all, the whitespace matching part is never triggered. But if I remove the > from it's beginning, it starts breaking tags.

Is there a better way than regex to do this, or can this regex be healed?

EDIT: The 1000 character line length limit is defined in RFC 821.

Kaivosukeltaja
  • 15,541
  • 4
  • 40
  • 70
  • There is, modify WYSIWYG editor output formatting, replace \n with
    , before sending data with JS, or inside server, before sending an e-mail.
    – Deele Mar 31 '11 at 11:16
  • @Deele: The editor's (FCKEditor 2.5) formatting shouldn't strip away the newlines, and on Firefox it doesn't. We don't want to add visual
    tags, we just want to keep it from turning into one continuous line when sending it.
    – Kaivosukeltaja Mar 31 '11 at 11:26
  • I would try to pass the html string through [tidy::repairString](http://php.net/manual/en/tidy.repairstring.php) with the clean config option on – Yann Milin Mar 31 '11 at 12:40
  • Thank you! Sometimes man search for simple solution like yours. – Tomas Kaidl Sep 08 '21 at 17:55

2 Answers2

3

Following my comment, I'm posting this as I have been able to run a test.

tidy::repairString shoud do the job just fine, better than any regex solution.

$content = "<html>......</html>";
$oTidy = new tidy();
$content = $oTidy->repairString($content,
    array("show-errors" => 0, "show-warnings" => false),
    "utf8"
);

Adapt the Charset parameter (3rd) to your needs.

The clean option is unneeded for this, I was wrong in my comment.

Yann Milin
  • 1,335
  • 1
  • 11
  • 22
  • This is exactly what I needed, I didn't know tidy will do this too. Thank you! – Kaivosukeltaja Mar 31 '11 at 13:47
  • but isn't it doing a lot of stuff that you don't need and that possibly could change your whole html? – Horen Oct 09 '12 at 16:23
  • @Horen: Yes, it does a lot of other stuff too, which can be turned on or off if needed. In our case changing the HTML doesn't matter, as long as the content and DOM structure remain the same. Here's a list of what tidy does: http://tidy.sourceforge.net/docs/quickref.html – Kaivosukeltaja Oct 10 '12 at 04:56
  • @Kaivosukeltaja Thanks for your post. I've seen the reference table but even when I turn off everything it will still change source code (e.g. the anchor-as-name option). I'll look into it again but I think a custom solution might be safer here... – Horen Oct 10 '12 at 07:35
0

If I understand everything correctly, you don't need to concern yourself with lines that don't contain HTML at all - these can be left to be handled by email clients.

Ansis Māliņš
  • 1,684
  • 15
  • 35
  • The problem is that many email clients, including Outlook, don't handle long lines correctly. The result is an exclamation mark as every 1000th character in our email. – Kaivosukeltaja Mar 31 '11 at 11:28
  • That's not a client-side issue, the limit is in the SMTP protocol and thus enforced one way or another by most mail servers. – tripleee Feb 03 '15 at 13:17