2

I am trying to parse a XML file in Android. It contains a tag with special characters like

<subject><![CDATA[FÚTBOL]]></subject>

while trying to parse the above text, i get an exception saying "XML token not well formed". I am using the XMLPullParser and has also sepcified the encoding using

parser.setInput(this.getInputStream(),"iso-8859-1");

I am not getting error while reading other characters like "áñí". I tried with different encodings but they all gave error.

Update

The problem was solved when i used SAX Parser instead of XML Pull Parser.

Mako
  • 1,465
  • 2
  • 18
  • 34

1 Answers1

0

Your ADT (assuming you use ADT) source file is edited in iso-8859-1 or utf-8 (default).

To check: right click on the file in any eclipse navigator, select properties and select the resource panel. At the bottom you should see how eclipse (and ADT) will code the source file before it is deployed on the ADV.

Alain Pannetier
  • 9,315
  • 3
  • 41
  • 46
  • But then, the xml prologue Should specify the applicable character set and you should not have to "guess" it. I understand is is not the case. When you access the xml from a standard browser, through its url, is the document also detected as not well formed ? – Alain Pannetier Feb 19 '11 at 19:23
  • yes the xml tag doesn't have the encoding attribute. And i also get error while viewing it in the browser. Could this be an issue with the XML file only ? – Mako Feb 20 '11 at 08:38
  • The code started working fine when i started using SAX Parser instead of XMLPull Parser. – Mako Feb 20 '11 at 17:10
  • Still the fact that browsers also detect an encoding problem mean that this is a workaround. If you have access to the server, then you might analyse the source data. Otherwise, let me know the URL I'll have a look (também em português ;-) – Alain Pannetier Feb 20 '11 at 17:23
  • You where right, the character set used IS iso-8859-1. However, since there is no `` header the XMLPullParser makes no assumption or the wrong assumption and complains about characters not in its default character set. I've added `` in front of the XML file and it was correctly parsed by FF. An hexa dump of the file shows 'Ú' as 0xDA (ISO-8859-1 for 'Ú', in UTF-8 we would have had 0xc3 0x9a. I believe you fell victim of [this problem](http://www.coderanch.com/t/495391/XML/Parsing-RSS-feeds-XML-Pull) – Alain Pannetier Feb 21 '11 at 15:20