0

I have a RoundCube plugin that writes the message body to the database and after that I need to parse the data into another table. By using certain functions in RoundCube I am able to remove all html tags and a </td> is replaced by '\n' and </tr> is replaced by '\n\n'. This make the parsing of my data very easy and robust. There is only one drawback, the html data are broken into fix lines with an = at the end, e.g.:

<td valign=3D"bottom" style=3D"color:#444444;padding:5px 10px 5=
px 0px;font-size:12px;border-bottom:1px solid #eeeeee;"><b>Discount</b></td=
><td valign=3D"bottom" align=3D"right" style=3D"color:#444444;padding:5px 0=
px 5px 0px;font-size:12px;border-bottom:1px solid #eeeeee;text-align:right;=
"><b>Price after discount</b></td>

Now, the </td='s are not getting recognised and therefore the Discount are joined to Price after discount in the following way DiscountPrice after discount\n, instead of Discount\n Price after discount\n. This is all the way through the code and are really causing me severe problems.

I tried to remove the = and break with things like:

$msg_body = str_replace('=', '', $msg_body);
$msg_body = str_replace('=\n', '', $msg_body);
$msg_body = str_replace('= ', '', $msg_body);

with no real success. I do not know which type of break comes after the = sign, whether it is a line break or paragraph break and tried to find out, but in vain, even looked at the RoundCube code. Echoing out the html did not revealed anything to me as well.

I post this here as a general php and html question in the hope that someone can help me to simply remove these = sign and the mysterious (to me) breaks so that

</td=
>

becomes

</td>

, etc.

Pshemo
  • 122,468
  • 25
  • 185
  • 269
Johan Marais
  • 1
  • 1
  • 1
  • 1
    search for `decode quoted-printable`, which is what you're trying to do. there's more to it than just removing equal signs and newlines. – dldnh Mar 25 '12 at 12:57
  • if you str_replace('=', '', $msg_body); before str_replace('=\n', '', $msg_body); there is normally no more =\n to detect... – Kharaone Mar 25 '12 at 13:02
  • I used them one by one and not all 3 in one go, but you right, one must be careful to put unnecessary lines of code in. – Johan Marais Mar 25 '12 at 15:10
  • Removing the newlines and equal signs was just a make shift solutions because I could not figure out what it is what I need to remove and the `decode quoted-printable` is the thing I needed. – Johan Marais Mar 25 '12 at 15:12

3 Answers3

4

The =XY notation is part of the (oldschool but still used!) quoted-printable encoding that represents a 8-bit ASCII string in 7-bit ASC codeset. All characters that are >127 are encoded in the form =F3, which is a hexadecimal representation of the character.

For example in your HTML tags, the = is encoded as =3D if you take a closer look at it.

Read more at Wikipedia on quoted-printable

To decode the message back to normal HTML, you must apply quoted_printable_decode() to the string.

$msg_body = quoted_printable_decode($msg_body);
Kaii
  • 20,122
  • 3
  • 38
  • 60
  • This single line of code solved all my miseries with this!! Thanks for that, I assume it is to ensure that most email readers read the email properly that they still used it because their website use the newest technologies. – Johan Marais Mar 25 '12 at 15:05
  • @JohanMarais thats right. you're welcome. please accept this as the correct answer if it helped you. thanks. http://stackoverflow.com/faq#howtoask – Kaii Mar 25 '12 at 15:12
0

For having escaped characters properly included, you have to use the double quote marks (") in PHP:

$msg_body = str_replace("=\n", '', $msg_body);

Otherwise, PHP will look for the string =\n.

Yogu
  • 9,165
  • 5
  • 37
  • 58
-1

depending on the system you're using the new line break can be:

\n
\r
\r\n

So check for those ones too

You can also use regexp, if you know that there is only selected number of markup that have the issue:

$msg_body = preg_replace('/(\w+)=[\s\r\n]*/', '$1', $msg_body);

In your case, it should transform the </td= ...> into <td>

Yogu
  • 9,165
  • 5
  • 37
  • 58
Kharaone
  • 597
  • 4
  • 16