0

I am having some strange behaviour when reading xml attribute values with SAX. It seems to be a bug in the SAX library I am using which is the system default.

Xml is very simple:

<?xml version="1.0"?>
<VOTABLE version="1.1">
    <RESOURCE type="results">
        <INFO name="QUERY_STATUS" value="OK" />
        <TABLE>
            <FIELD ID="Reference" ucd="DATA_LINK" datatype="char" arraysize="*" />
            <FIELD ID="URN" ucd="HCSS_URN" datatype="char" arraysize="*" />
            <FIELD ID="HCSSFileName" ucd="HCSS_FILE_NAME" datatype="char" arraysize="*" />
        </TABLE>
    </RESOURCE>
</VOTABLE>

For example I am sometimes seeing when reading an attribute value: startElement: Attr: 'cCSS_FILE_NAME' from com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser$AttributesProxy

Somehow the text parsing has gone wrong using e.g) attributes.getValue(id); and should be HCSS_FILE_NAME.

It seems to be widely documented on various forums that there are many bugs with the built in SAX parsers, for example I found the following:

https://community.oracle.com/thread/1627769 http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6690015

I also read something can occur like this for xml 1.1 but this is not the case or there are problems reading long attribute values but again this is not the case.

I can only think to try to configure a different SAX Parser such as: org.apache.xerces.jaxp.SAXParserFactoryImpl

Thanks for any tips.

dbank
  • 1,173
  • 1
  • 17
  • 29
user1472672
  • 313
  • 1
  • 9
  • I don´t follow. What is exactly your problem? Can you link it with the example you have given? – Victor Mar 10 '15 at 14:32
  • I think its quite clear from the example. The attribute value is read incorrectly by SAX. It returns 'cCSS_FILE_NAME' but should return what is in the xml e.g) HCSS_FILE_NAME. – user1472672 Mar 10 '15 at 15:23
  • 1
    OK. Sorry, I did not see the field you were talking about. Can you post the code you have used to read the XML? On the other hand, if you want to use the `SAXParserFactoryImpl`you can use this: `System.setProperty("javax.xml.parsers.SAXParserFactory", "org.apache.xerces.jaxp.SAXParserFactoryImpl");` – Victor Mar 10 '15 at 15:30

1 Answers1

0

Don't use the XML parser built in to the JDK. It is buggy, and the most common bug manifests itself as corrupt attribute values. This bug has been around for years, and as far as I know is present in all JDK versions. Switch to using the version of Xerces from Apache.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164