4

I'm having problems parsing an xml string using XmlBeans. The problem itself is in a J2EE application where the string itself is received from external systems, but i replicated the problem in a small test project.

The only solution i found is to let XmlBeans parse a File instead of a String, but that's not an option in the J2EE application. Plus i really want to know what exactly the problem is because i want to solve it.

Source of test class:

public class TestXmlSpy {

    public static void main(String[] args) throws IOException {
        InputStreamReader reader = new InputStreamReader(new FileInputStream("d:\\temp\\IE734.xml"),"UTF-8");
        BufferedReader r = new BufferedReader(reader);
        String xml = "";
        String str;

        while ((str = r.readLine()) != null) {
            xml = xml + str;
        }
        xml = xml.trim();
        System.out.println("Ready reading XML");
        XmlOptions options = new XmlOptions();
        options.setCharacterEncoding("UTF-8");

        try {
            XmlObject xmlObject = XmlObject.Factory.parse(new File("D:\\temp\\IE734.xml"), options);
            System.out.println("Ready parsing File");
            XmlObject.Factory.parse(xml, options);
            System.out.println("Ready parsing String");
        } catch (XmlException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }   
}

The XML file validates perfectly against the XSD's im using. Also, parsing it as a File object works fine and gives me a parsed XmlObject to work with. However, parsing the xml-String gives the stacktrace below. I've checked the string itself in the debugger and don't really see anything wrong with it at first sight, especially not at row 1 column 1 where i think the Sax parser is having a problem with if i'm interpreting the error correctly.

debug

Stacktrace:

Ready reading XML
Ready parsing File
org.apache.xmlbeans.XmlException: error: Unexpected element: CDATA
    at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3511)
    at org.apache.xmlbeans.impl.store.Locale.parse(Locale.java:713)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:697)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:684)
    at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:208)
    at org.apache.xmlbeans.XmlObject$Factory.parse(XmlObject.java:658)
    at xmlspy.TestXmlSpy.main(TestXmlSpy.java:37)
Caused by: org.xml.sax.SAXParseException; systemId: file:; lineNumber: 1; columnNumber: 1; Unexpected element: CDATA
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportFatalError(Piccolo.java:1038)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:723)
    at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3479)
    ... 6 more
Andrew Barber
  • 39,603
  • 20
  • 94
  • 123
Martijn
  • 107
  • 1
  • 1
  • 9
  • Not sure if this will help, but try calling setLoadStripComments() on the options object before parsing. I think the parser is barfing on the comment – cmbaxter May 19 '13 at 22:30
  • Are you sure you are reading the file into a strong using the correct encoding? The file based reader will apply automatic encoding detection while the default encoding your `BufferedReader` is using may be invalid for the file. – Stephen Connolly May 19 '13 at 23:11
  • Also, can you add the xml string that you are parsing that is causing this failure? – cmbaxter May 19 '13 at 23:20
  • setLoadStripComments() didnt help. There most likely is something wrong with the string but i cant see what. Even if i solve it, i need to find out how to convert the string to a "working" one since at the actual application i need to use this the string is passed from an external system, so i cant read the file myself there. – Martijn May 19 '13 at 23:26
  • XML is at: http://www.nyn.dds.nl/debug.txt The first characters do look funky (smaller), so i guess its a charset problem of some sort? Doesnt show when i open it in chrome, when i view it in notepad it seems weird. – Martijn May 19 '13 at 23:34
  • Solved it (sortof) by converting the String to an inputstream: InputStream good = new ByteArrayInputStream(xml.getBytes("UTF-8")); and then parsing that: XmlObject.Factory.parse(good, options); I'm still open to tips if there is a neater way to handle this though? Thanks for getting me on the right track! Your tips got me looking into the encoding/charsets further! – Martijn May 19 '13 at 23:54
  • @ArjanTijms Could you explain the reason for all of these retags? – Andrew Barber May 26 '13 at 11:06
  • 1
    @AndrewBarber do you mean the edits? There was no re-tagging on this question. The J2EE to Java EE is because the term "J2EE" has been deprecated since early 2007. SO itself automatically renames the "J2EE" tag as well. Using "J2EE" on new questions not rarely leads to a number of comments from users saying it's antiquated and should not be used. For this question specifically it's IMHO rather clear the user meant Java EE and not J2EE. Why do you think J2EE is better here then? Did I miss something? – Arjan Tijms May 26 '13 at 13:32

2 Answers2

2

This is an encoding problem, I used the below code that worked for me:

        File xmlFile = new File("./data/file.xml");
        FileDocument fileDoc = FileDocument.Factory.parse(xmlFile);
2

The exception is caused by the length of the XML file. If you add or remove one character from the file, the parser will succeed.

The problem occurs within the 3rd party PiccoloLexer library that XMLBeans relies on. It has been fixed in revision 959082 but has not been applied to xbean 2.5 jar.

What does the org.apache.xmlbeans.XmlException with a message of “Unexpected element: CDATA” mean?

XMLBeans - Problem with XML files if length is exactly 8193bytes

Issue reported on XMLBean Jira

jcwhall
  • 63
  • 4