-1

I had written a script to read email from a mailbox.

in some email i am getting some data being converted into wiered characters that are breaking my further processing.

those character looks something like this http://brucejohnson.ca/HTMLCharacters13.html

Any idea how to convert them into original content.

Anand
  • 1,670
  • 1
  • 15
  • 23

3 Answers3

1

if the script is giving you those characters, then you have two options, see the character as is, or see the numerical equivalent of that character (in various bases - octal, hex etc).

Are you sure that your script isn't trying to read an encrypted mail, and that your script works fine?

Try putting some dummy test data through the functions/script you've written to see if it produces the output you expect.

Hope this helps

OwenD
  • 85
  • 1
  • 7
0

You need to check the charset encoding in the email headers first.

Once you have done this you then chose 1 of 2 methods, change the charset in the HTML or change the charset (where possible) to the charset you're already using (probably UTF-8)

If you dynamically change the HTML charset in the header then your biggest problem is the users will need to specify the correct charset in their browser settings, for example mine is set to UTF-8 however my emails are in ISO-8859-1 so if I was to employ this method every time I look at the site I would need to change my browser charset but a friend of mine has ISO-8859-1 as his normal charset so he would have no problems.

If you encode the characters to UTF-8 (e.g. utf8_encode in php) you need to ensure the content isn't already in UTF-8 otherwise you may find the encode function creates other invalid characters.

The way I handle this is basically to decode the mime header of the email, then use preg_match in PHP to detect the charset being used, from there I run the encoding to UTF-8 or not.

This is a very complicated activity at times dealing mail and various charsets based on the sender of the email, you don't really know in advance what charset will be used so you need to really understand the various charsets, how they are best stored if storing them and how they are best displayed, you then need to translate this to your app and target market.

GOod luck with your app

Ryan
  • 1,878
  • 1
  • 14
  • 17
-1

have u checked the character encoding It must be UTF-8. If it is western europian then change to UTF-8

Rush
  • 740
  • 4
  • 13
  • Charset encoding can be whatever is specified, UTF-8 is the recommended charset for English however for example ISO-8859-X is standard in Latin America and not UTF-8. This creates other issues with web apps that have been developed for an English speaking market but delivered to a non-English speaking market, I am from NZ but live and develop in Brazil so it is a daily challenge for me – Ryan Jul 14 '11 at 13:59