7

Update There is no ready XML parser in Java community which can do NIO and XML parsing. This is the closest I found, and it's incomplete: http://wiki.fasterxml.com/AaltoHome

I have the following code:

InputStream input = ...;
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();

XMLStreamReader streamReader = xmlInputFactory.createXMLStreamReader(input, "UTF-8");

Question is, why does the method #createXMLStreamReader() expects to have an entire XML document in the input stream? Why is it called a "stream reader", if it can't seem to process a portion of XML data? For example, if I feed:

<root>
    <child>

to it, it would tell me I'm missing the closing tags. Even before I begin iterating the stream reader itself. I suspect that I just don't know how to use a XMLStreamReader properly. I should be able to supply it with data by pieces, right? I need it because I'm processing a XML stream coming in from network socket, and don't want to load the whole source text into memory.

Thank you for help, Yuri.

Yuri Geinish
  • 16,744
  • 6
  • 38
  • 40

6 Answers6

3

You can get what you want - a partial parse, but you must not close the stream when you reach the end of the current available data. Keep the stream open, and the parser will simply block when it gets to the end of the stream. When you have more data, then add it to the stream, and the parser will continue.

This arrangement requires two threads - one thread running the parser, and another fetching data. To bridge the two threads, you use a pipe - a PipeInputStream and PipeOutputStream pair that push data from the reader thread into the input stream used by the parser. (The parser is reading data from the PipeInputStream.)

mdma
  • 56,943
  • 12
  • 94
  • 128
  • I should've clarified that blocking is not an option in my case. When there's no more data available for reading (at present invocation) the parser should treat it like normal situation and provide me with whatever it parsed from the partial data. – Yuri Geinish Feb 10 '11 at 08:12
1

The stream must contain the content for an entire XML document, just not all in memory at the same time (this is what streams do). You might be able to keep the stream and the reader open to continue feeding in content; however, it would have to be part of a well-formed XML document.

Suggestion: You might want to read a bit more about how sockets and streams work before going much farther.

Hope this helps.

cjstehno
  • 13,468
  • 4
  • 44
  • 56
  • 2
    Yes, potentially the stream must contain an entire document. But why should XMLStreamReader try to validate all of it up front? It's a stream. Why can't it just go along with the data and parse whatever is available? And *if* it encounters an error, I would deal with it myself. Correct me if I'm wrong - you're saying that if I'm reading 1 gigabyte-sized XML document over a network, I should download all of it and only then XMLStreamReader would be able to iterate over it? – Yuri Geinish Apr 16 '10 at 15:19
  • I would think that it would not validate until the whole stream has been processed (and closed). You should not have to download the whole thing, thats what streams are for. Are you writing to the stream then closing it and trying to then write more? – cjstehno Apr 16 '10 at 15:59
  • Yuri, no, Stax parsers will NOT read it completely first; you can definitely start reading right away, and parser will only block if it does not yet have any data to parse. I don't know what the issue is, but your understanding is correct. – StaxMan Oct 02 '10 at 00:35
  • @StaxMan Blocking is not an option, like I explained in comment to @mdma. – Yuri Geinish Feb 10 '11 at 08:13
  • Ok. Perhaps you could modify question slightly to indicate this exactly? Btw, with respect to Aalto, it is once again active and async API is complete; and async parser (there is fully ready blocking one; and mostly complete async parser) is finally getting completed. I would love to see you on mailing list to discuss more, get some feedback? – StaxMan Feb 10 '11 at 17:03
1

If you absolutely need NIO with content "push", there are developers interested in completing API for Aalto. Parser itself is complete Stax implementation as well as alternative "push input" (feeding input instead of using InputStream). So you might instead want to check out mailing lists if you are interested. Not everyone reads StackOverflow questions. :-)

StaxMan
  • 113,358
  • 34
  • 211
  • 239
0

Which Java version are you using? With JDK 1.6.0_19, I get the behaviour you seem to be expecting. Iterating over your example XML fragment gives me three events:

  • START_ELEMENT (root)
  • CHARACTERS (whitespace between and )
  • START_ELEMENT (child)

The fourth invokation of next() throws an XMLStreamException: ParseError at [row,col]:[2,12] Message: XML document structures must start and end within the same entity.

jarnbjo
  • 33,923
  • 7
  • 70
  • 94
0

With the XMLEventReader using stax parser it works for me without any issues.

  final XMLEventReader xmlEventReader= XMLInputFactory
                    .newInstance().createXMLEventReader(new FileInputStream(file));

file is obviously your input.

 while(xmlEventReader.hasNext()){

        XMLEvent xmlEvent = xmlEventReader.nextEvent();
        logger.debug("LOG XML EVENT "+xmlEvent.toString());
        if (xmlEvent.isStartElement()){ 
         //continue implementation
selman
  • 1,215
  • 2
  • 15
  • 33
-2

Look at this link to understand more about how streaming parsers work and how does it keep you r memory foot print smaller. For incoming XML, you would need to first serialize the incoming XML and create a well formed XML, then giving it to streaming parser.

http://www.devx.com/xml/Article/34037/1954

Fazal
  • 2,991
  • 7
  • 29
  • 37