0

I'm having a serious problem with imap decoding. I received an email which might be encoded in windows-874. And this causes the whole letter to be read. I tried to use iconv('tis-620','utf-8',$txt) but I've had no luck.

I've tried searching everywhere that there might be an answer but it seems like it is the first problem of the universe. (or I don't search the correct word?)

The subject is :

Charset : ASCII

=?windows-874?Q?=CB=E9=CD=A7=BE=D1=A1=C3=D2=A4=D2=BE=D4=E0=C8=C9=CA=D3=CB=C3=D1=BA=A7=D2=B9=E4=B7=C2=E0=B7=D5=E8=C2=C7=E4=B7=C2=A4=C3=D1=E9=A7=B7=D5=E8 30 =E2=C3=A7=E1=C3=C1=CA=C7=D1=CA=B4=D5=CA=D8=A2=D8=C1=C7=D4=B7=AB=CD=C2 8?=

So, please tell me what the encoding is, if it's not tis-62. How can I decode this into a human language?

ElGavilan
  • 6,610
  • 16
  • 27
  • 36
Wilf
  • 2,297
  • 5
  • 38
  • 82
  • Read [this](http://stackoverflow.com/questions/8701269/how-to-decode-q-encoding-in-c). – Jabberwocky Apr 23 '14 at 14:44
  • Thanks for the comment, but not work! – Wilf Apr 23 '14 at 15:00
  • How did it not work? What did you do? 'It did not work' is not enough information for us to help you. This is RFC 2047 encoding. – Max Apr 23 '14 at 19:52
  • @Max Didn't I say it clear above? – Wilf Apr 24 '14 at 05:06
  • 1
    My mail parser/decoder turns that into "ห้องพักราคาพิเศษสำหรับงานไทยเที่ยวไทยครั้งที่ 30 โรงแรมสวัสดีสุขุมวิทซอย 8", which google translates as "Rates for Thailand Travel Thailand 30 times Sawasdee Sukhumvit Soi 8". Hm. Is that plausible? I've no idea. – arnt Apr 24 '14 at 08:03
  • @arnt Would you mind showing me the code sample? So I can learn how to make it so simple. – Wilf Apr 24 '14 at 14:44
  • It involves a few thousand lines of code. There's an RFC2047 decoder, then a charset guesser and conversion to unicode. (I know it says cp874. Sometimes you can trust what that says and sometimes you cannot.) – arnt Apr 24 '14 at 17:45

1 Answers1

0

Finally I found my way home. Firstly I created a function to detect any encoding in a text given.

function win874($str){
    $win874=strpos($str,"windows-874");
    return $win874;
}

function utf8($str){
    $utf8=strpos($str,"UTF-8");
    return $utf8;
}

Then I convert with php functions:

if(win874($headers->subject)=="0" and utf8($headers->subject)=="0"){
    echo $headers->subject;
}
if(win874($headers->subject)>="1"){
    $subj0=explode("?",$headers->subject);
    echo $subj0[3];
}
if(utf8($headers->subject)>="1"){
    echo imap_utf8($headers->subject);
}

Because text with windows-874 always begins with "=?windows-874?Q?" so I used the simple function like "explode()" to extract the main idea from the junk. As I said, the main idea always comes after the 3rd question mark. Then I have the subject.

But the problem remains. I still have to change the browser encoding to Thai to make the text readable. (settings>tools>encoding>Thai : in chrome). Any suggestions?

Wilf
  • 2,297
  • 5
  • 38
  • 82
  • 1
    Er, generally you use a generic RFC 2407 decoder, which I'm sure your language has somewhere to get a unicode representation of it, then you output it using whatever character set your page is using (presumably utf-8). – Max Apr 24 '14 at 20:15