5

I'm making an Android application that reads an XML Internet. This application uses SAX to parse XML. This is my code for the part of parsing:

public LectorSAX(String url){
    try{
        SAXParserFactory spf=SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        DefaultHandler lxmlr=new LibraryXMLReader() ;
        sp.parse(url, lxmlr);

        nodo=((LibraryXMLReader)lxmlr).getNodoActual();

    }catch(ParserConfigurationException e){ 
        System.err.println("Error de parseo en LectorSAX.java: "+e);
    }catch(SAXException e){
        System.err.println("Error de sax LectorSAX.java: " + e);
    } catch (IOException e){
        System.err.println("Error de  io LectorSAX.java: " + e);
    }
}

The problem is that SAXException occurs. The exception message is as follows:

org.apache.harmony.xml.ExpatParser$ParseException: At line 4, column 42: not well-formed (invalid token)

However, if I put the same code in a normal Java SE application, this exception does not occur and everything works fine.

Why the same code works fine in a Java SE application, not an Android?. On the other hand, How to solve the problem?.

Thanks for the help.

Greetings.

javanna
  • 59,145
  • 14
  • 144
  • 125
Lobo
  • 4,001
  • 8
  • 37
  • 67
  • Can u share ur xml ... and as per error there is a problem with ur xml .. – Code_Life Jan 12 '12 at 05:08
  • @MohitSharma But, why the same code works fine in a Java SE application, not an Android?. This is the URL: http://www.aemet.es/xml/municipios/localidad_33002.xml – Lobo Jan 12 '12 at 09:21
  • original i thought that there is some problem with ur xml as per error .. but its not like that ... now only solution with u is that u have debug the parsing – Code_Life Jan 12 '12 at 09:55
  • Maybe this has something to do with the encoding ? Do you use UTF-8 ? I don't know the specifics of Android. – Ludovic Kuty Jan 17 '12 at 11:33
  • @Ikuti I have not specified encoding. I have not specified encoding. Is it necessary in Android?, It seems that Java is not required. – Lobo Jan 17 '12 at 12:16

1 Answers1

11

This could be a character encoding problem.
As you can see, the invalid token error points to the line #4.
In this line, you can find an acute (Meteorología) and a tilde (España). The XML header shows a ISO-8859-15 encoding value. As it's less common than UTFs or ISO-8859-1 encodings, this could result in a error when the SAXParser connects and try to convert the byte content into chars using your system default charset.

Then, you'll need to tell the SAXParser which charset to use. A way to do so, is to pass an InputSource, instead of the URL, to the parse method. As an example:

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();

InputSource is = new InputSource(url);
is.setEncoding("ISO-8859-15");

DefaultHandler lxmlr=new LibraryXMLReader() ;
sp.parse(is, lxmlr);

EDIT: It seems that Android VM does not support this encoding, throwing a org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 0: unknown encoding exception.
As ISO-8859-15 it's mainly compatible with ISO-8859-1, except some specific characters (as you can see here), a workaround is changing the ISO-8859-15 value to ISO-8859-1 at the setEncoding method, forcing the parser to use a different but compatible charset encoding:

is.setEncoding("ISO-8859-1");

As it seems, as Android doesn't support the declared charset, it uses its default (UTF-8) and hence the parser can't use the XML declaration to choose the apropiate encoding.

Tomas Narros
  • 13,390
  • 2
  • 40
  • 56
  • 1
    Hi @tomas-narros, thanks, I'll try and you'll notice the result. – Lobo Jan 17 '12 at 12:14
  • 3
    Shouldn't the parser precicely use the XML declaration to choose the appropriate encoding? – JB Nizet Jan 17 '12 at 12:24
  • 1
    That's a good point @JB. For sure it should. But I'm pretty sure that this is a encoding problem. – Tomas Narros Jan 17 '12 at 13:38
  • Hi @TomasNarros,I tried with the encoding you told me, but I get the following error: Error de sax LectorSAX.java: org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 0: unknown encoding – Lobo Jan 18 '12 at 19:17
  • 1
    @Lobo: Okay. It seems that Android VM does not support this encoding. As ISO-8859-15 it's mainly compatible with ISO-8859-1, except some specific characters (as you can see at http://en.wikipedia.org/wiki/ISO/IEC_8859-15), i would try changing the ISO-8859-15 value to ISO-8859-1 at the setEncoding method. It seems that as long as you Android doesn't support the charset, it's using its default (UTF-8) and hence the parser can't use the XML declaration to choose the apropiate encoding. Please, check and tell me if it worked – Tomas Narros Jan 19 '12 at 09:48
  • 1
    @Lobo: I'm glad to hear it. I will update the full answer to make the solution available at it for future references. – Tomas Narros Jan 20 '12 at 08:55
  • @TomasNarros I am using XML parser. In that Xml.parse() method do not supports `InputSource`. In this case, any idea? – Ravi Bhatt Apr 25 '13 at 07:08