1

I'm having the following exception when trying to parse some XML:

org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 0: not well-formed (invalid token)

The main issue is that this has only happened in Android 2.2 or 2.3 devices, but the weirdest thing is that the first time I parse the response it is ok, but all the following tries give me the parsing exception.

My code is as follows:

        URL url = new URL("http://m.ideasmusik.com/rss/?ct=mx");
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        //InputSource is = new InputSource("http://m.ideasmusik.com/rss/?ct=mx");
        //is.setEncoding(HTTP.UTF_8);   

        // Parse content
        MusicRSSParser parser = new MusicHandler.MusicRSSParser(); //DefaultHandler
        XMLReader xr = sp.getXMLReader();
        xr.setContentHandler(parser);
        InputSource in = new InputSource(url.openStream());//is.getByteStream());
        in.setEncoding(HTTP.UTF_8);
        xr.parse(in);

The XML is UTF-8 (I've read that is a common problem to have incorrect encoding).

Any guess on what is going wrong? I thought that it could be something with my handler but it crashes before my logic applies, right after the startDocument() method.

i have tried with Url instead of InputStream with the same result.

EDIT

If I go to Application Management and erase app caché, then it works ok, for the first time. How can it be affecting the parsing??

htafoya
  • 18,261
  • 11
  • 80
  • 104

2 Answers2

3

Got it!

The problem is that the RSS has a problem!

Not every browser shows it (when they format it with colors they erase the problem), but the source code begins like:

<?xml version=\"1.0\" encoding=\"UTF-8\"?>
      <rss version=\"2.0\">
          <channel>
               <title>Top Canciones</title>
               <link>m.ideasmusik.com/rss/?ct=mx&</link> ...

The problem is that XML can't have & symbols without being escaped.

All the other symbols were escaped in the document but I think they miss that one because it is in the link tag and not as main content.

Somehow on the first run the SAX parser ignores that..

What I did (while the RSS is fixed) was to get the string response and remove that & manually before parsing the XML. I know that is a horrible solution but it's the quickest and easiest solution for the moment.

htafoya
  • 18,261
  • 11
  • 80
  • 104
  • if you could share the code it would be great.. i am facing a similar issue and not sure how to fix it – nathandrake Mar 29 '16 at 10:03
  • @nathandrake I don't have the code right now, but instead of streaming and parsing directly the XML, it was first saved to a String, the character was replaced and then it was parsed with SAX accordingly. However, the best solution would be ask for the backend developer to escape the character, or content manager to remove the & – htafoya Apr 02 '16 at 16:39
0

but the weirdest thing is that the first time I parse the response it is ok, but all the following tries give me the parsing exception

I had the same problem. It happens on some devices (e.q. Samsung Galaxy S2) and not only on android 2.3 but also on later on. E.g. on Galaxy S2 (4.4.2) it occurs but on the emulator (4.4.2) it doesn't. The problem is probably with caching the request. After the second request string with XML was written and read again with wrong character(s) encoded.

I solved (after a lot of work;) ) my problem with adding simple setUseCaches(false) on my connection:

    URLConnection conn = url.openConnection();
    conn.setUseCaches(false);
AppiDevo
  • 3,195
  • 3
  • 33
  • 51