3

I came across the following error in PHP generated by an email forwarded from a Yahoo account:

Notice: Unknown: Invalid quoted-printable sequence: =?UTF-8?Q?ck-off with Weekly Sale up to 90% off (errflg=3) in Unknown on line 0

I've spent hours researching this issue and decided to send myself the exact same output string in an email without having Yahoo involved. The original q-encoded text that decodes correctly:

=?UTF-8?Q?GOG_Forward=3A_Fw=3A_=F0=9F=98=89_A_great_Monday_kick-?= =?UTF-8?Q?off_with_Weekly_Sale_up_to_90=25_off?=

The malformed q-encoded text from Yahoo:

=?UTF-8?Q?GOG_Forward =?UTF-8?Q?ck-off_with_Weekly_Sale_up_to_90%_off?=

The correct string when decoded:

GOG Forward: Fw: A great Monday kick-off with Weekly Sale up to 90% off

Roundcube manages to decode both the normal and the malformed text though I'm not sure how and 25 megabytes is a bit much to dig through and I haven't been able to determine even where they're decoding subject headers.

How do I fix Yahoo's malformed version of q-encoding?

<?php
//These fail:
echo imap_mime_header_decode($mail_message_headers['Subject']);
echo quoted_printable_decode($mail_message_headers['Subject']);
?>

For clarification the imap_fetchstructure page clarifies the value 4 for encoding is Quoted-Printable / ENCQUOTEDPRINTABLE.


New Development

It turns out that for some reason Yahoo sends the subject twice for the same header, one malformed and the other is not. Here is the Subject header from the raw email:

Subject: =?UTF-8?Q?GOG_Forward:_Fw:_=F0=9F=98=89_A_great_Monday_ki?=
 =?UTF-8?Q?ck-off_with_Weekly_Sale_up_to_90%_off?=
MIME-Version: 1.0
John
  • 1
  • 13
  • 98
  • 177
  • `quoted_printable_decode('=?UTF-8?Q?GOG_Forward =?UTF-8?Q?ck-off_with_Weekly_Sale_up_to_90%_off?=')` does work. – Olivier Oct 16 '20 at 19:04
  • @Olivier That outputs `=?UTF-8?Q?GOG_Forward =?UTF-8?Q?ck-off_with_Weekly_Sale_up_to_90%_off?` which is incorrect. – John Oct 16 '20 at 19:12
  • As I understand, what you're trying to accomplish is 1) to be able to decode strings similar to `=?UTF-8?Q?...`? and 2) Being able to property import a message form yahoo? – Patrick Oct 21 '20 at 00:27
  • The question https://stackoverflow.com/questions/64589910/fix-duplicate-ids-in-php-html-dom-to-be-converted-to-xml#comment114226738_64589910 looks interesting and clear to me, though I'm not a SME. I'd vote to reopen, you might ask in the PHP chat room if anything seems unclear about it, they might help too – CertainPerformance Oct 30 '20 at 03:43

2 Answers2

7

I created a solution that uses Roundcube's source code to decode the message.

I posted the code and demo:

  • You can see it here
  • Click the big play button to preview the extraction
  • Go to code tab to see the extracted Roundcube code that you could use for your project

Since you mentioned to not use classes in the example I extracted Roundcube's decode_mime_string() function from rube_mime, and a couple of things from rcube_charset such as $aliases, parse_charset(), and convert().


As far as decoding the malformed text from Yahoo:

=?UTF-8?Q?GOG_Forward =?UTF-8?Q?ck-off_with_Weekly_Sale_up_to_90%_off?=

Into this:

GOG Forward: Fw: A great Monday kick-off with Weekly Sale up to 90% off

It's impossible. There's not enough data in there. For example it's missing the " A great Monday ki". Do you have the full source of the email address?

Patrick
  • 1,788
  • 4
  • 20
  • I can post the raw email headers if you'd like? Each email is just a file. – John Oct 21 '20 at 01:06
  • That would definitely help. Also possible to include the context of how you got this file? – Patrick Oct 21 '20 at 01:17
  • I found the *raw email* and updated the bottom of my post. It turns out `mb_decode_mimeheader('=?UTF-8?Q?GOG_Forward:_Fw:_=F0=9F=98=89_A_great_Monday_ki?=')` works fine though how does Roundcube handle this oddity with two lines for the subject and discards the invalid line? WTF is wrong with the devs at Yahoo? – John Oct 21 '20 at 01:42
  • @John *"how does Roundcube handle this oddity"* This "oddity" is perfectly standard, as Patrick pointed out. – Olivier Oct 21 '20 at 19:10
  • 1
    Patrick, good stuff! I can actually read your version of the code. I dumped the raw header as a fourth string and tested it out. It's ridiculous how much nonsense we have to deal with just to show a simple subject line. Thank you; accepted *and +1*. – John Oct 22 '20 at 00:22
0

You don't actually need to go for any third party solution. There is already an in-built imap function for decoding strings like the one you given, i.e. imap_utf8 and works pretty fine. Here's an example taken from your question.


<?php
    echo imap_utf8('=?UTF-8?Q?GOG_Forward:_Fw:_=F0=9F=98=89_A_great_Monday_ki?=
 =?UTF-8?Q?ck-off_with_Weekly_Sale_up_to_90%_off?=');
//GOG Forward: Fw:  A great Monday ki ck-off with Weekly Sale up to 90% off

?>

enter image description here

And as for your doubt why it was throwing that error for quoted_printable_decode, well actually your string is of type utf8 encoded.

mohiwalla
  • 21
  • 4