9

Is it feasible in Java using the SAX api to parse a list of XML fragments with no root element from a stream input?

I tried parsing such an XML but got a

org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed.

before even the endDocument event was fired.

I would like not to settle with obvious but clumsy solutions as "Pre-append a custom root element or Use buffered fragment parsing".

I am using the standard SAX API of Java 1.6. The SAX factory had setValidating(false) in case anyone wondered.

yannisf
  • 6,016
  • 9
  • 39
  • 61
  • Duplicate of http://stackoverflow.com/questions/3232110/parse-file-containing-xml-fragments-in-java. – james.garriss Nov 19 '14 at 16:29
  • You can refer [Resolving "The markup in the document following the root element must be well-formed" Exception](http://opensourceforgeeks.blogspot.in/2015/01/resolving-markup-in-document-following.html) – Aniket Thakur Jan 26 '15 at 18:59

1 Answers1

13

First, and most important of all, the content you are parsing is not an XML document. From the XML Specification:

[Definition: There is exactly one element, called the root, or document element, no part of which appears in the content of any other element.]

Now, as to parsing this with SAX - in spite of what you said about clumsiness - I'd suggest the following approach:

Enumeration<InputStream> streams = Collections.enumeration(
    Arrays.asList(new InputStream[] {
        new ByteArrayInputStream("<root>".getBytes()),
        yourXmlLikeStream,
        new ByteArrayInputStream("</root>".getBytes()),
    }));

SequenceInputStream seqStream = new SequenceInputStream(streams);

// Now pass the `seqStream` into the SAX parser.

Using the SequenceInputStream is a convenient way of concatenating multiple input streams into a single stream. They will be read in the order they are passed to the constructor (or in this case - returned by the Enumeration).

Pass it to your SAX parser, and you are done.

npe
  • 15,395
  • 1
  • 56
  • 55
  • Agreed - the reason for the clumsy appending a root element is because you are dealing with clumsy data. Otherwise, as soon as you close any element which is the first element you opened, the SAX parser will believe it has finished, as it has. I also do it this way for an formatted-like-XML stream of data – Woody Jun 27 '12 at 13:24
  • Although you provided an answer I have already thought of, the implementation is much more elegant than I could ever think! Thank you for your answer. – yannisf Jun 27 '12 at 14:03
  • 3
    Well, `SequenceInputStream` is one of those *long forgotten* utilities, that nobody seems to know about, despite being there since Java 1.0. Just wanted to remind it's still there. :) – npe Jun 27 '12 at 14:32