4

Whenever I try to parse XML with special characters such as ō or 満月先生 I get an error. The xml documents claims to use UTF-8 encoding but that does not seem to be the case. Here is what the troublesome text looks like when I view the XML in Firefox:

Bleach: The Diamond Dust Rebellion - M� Hitotsu no Hy�rinmaru; Bleach - The DiamondDust Rebellion - Mou Hitotsu no Hyourinmaru

On the actual website, Å� is actually the character ō.

<br /> One day, Doraemon and his friends meet Professor Mangetsu (����, Professor Mangetsu?), who studies magic and magical beings such as goblins, and his daughter Miyoko (���, Miyoko?), and are warned of the dangerous approximation of the &quot;star of the Underworld&quot; to the Earth&#039;s orbit.<br /> <br />

And once again, on the actual website, those characters appear as 満月先生 and 美夜子.

The actual XML file is formatted properly other than those special characters, which certainly do not appear to be using the UTF-8 encoding. Is there a way to get NSXML to parse these XML files?

Snooze
  • 499
  • 1
  • 5
  • 14
  • Looks like UTF-8 interpreted as Latin-1 and reencoded. – Ignacio Vazquez-Abrams Jun 05 '10 at 07:51
  • As I mentioned, on the actual website the characters appear as ō and 満月先生 but in the XML document (defined as UTF-8 in the header) they show up as Å� and æº�æ��å��ç��. Do you think that is just Firefox interpreting the characters as Latin-1, or the people who created the XML document messed up? If I try loading the XML in Xcode with NSUTF8StringEncoding, it does not work. If I specify encodings such as NSASCIIStringEncoding or NSISOLatin1StringEncoding it will load the document, but displays the ō as Å which is the code for Å (looks like data loss). – Snooze Jun 05 '10 at 21:57

1 Answers1

3

To use other characters than those who are utf-8, you need to use their special character code. If you want to represent ö you need to type &ouml;

Find more on
Wikipedia: http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

Benny Skogberg
  • 10,431
  • 11
  • 53
  • 83