0

Using the following code, I'm successfully reading XML files. However when a comment appears in the middle of a node, the reader is discarding the remainder of the node. For example:

<text>thisismy<!--comment-->document</text>

would result in a return string of "thisismy" and nothing else.

I had a similar problem earlier when I'd encounter special chars like & and setting the XMLInputFactory to isCoalescing=true fixed that. I'm guessing I've encountered a related feature.

I need to be able to process such documents elegantly. Can anyone suggest how I might work around such interruptions?

try {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        factory.setProperty("javax.xml.stream.isCoalescing", true);
        XMLEventReader eventReader =
                factory.createXMLEventReader(new FileReader(fileName));

        while(eventReader.hasNext()) {
            XMLEvent event = eventReader.nextEvent();

            switch(event.getEventType()) {

                case XMLStreamConstants.START_ELEMENT:
                    StartElement startElement = event.asStartElement();
                    String qName = startElement.getName().getLocalPart();

                    if (qName.equalsIgnoreCase("page")) {
                        page = new DocumentPage();
                        Iterator<Attribute> attributes = startElement.getAttributes();
                        while(attributes.hasNext())
                        {
                            Attribute attribute = attributes.next();
                            switch (attribute.getName().toString().toLowerCase()) {
                                case "index" :
                                    pageIndex = attribute.getValue();
                                    page.setPageIndex(pageIndex);
                                    break;
N8888
  • 670
  • 2
  • 14
  • 20
srodden
  • 57
  • 1
  • 4
  • 1
    This is probably an issue with your code (you don't show the relevant parts), and not with the reader. The reader will not coalesce character sections separated by a comment, so you must handle multiple CHARACTER events "for the same" START_ELEMENT – forty-two Jul 31 '18 at 10:35
  • Thanks. I figured that might be the case and so set about doing that. I'm now copying the characters to a buffer and then wrapping it up in the END_ELEMENT section. Ugly IMO but it works. Thanks for the confirmation. Shame that it doesn't support a stripComments mode. – srodden Jul 31 '18 at 10:39

0 Answers0