0

I'm receiving byte arrays (actually, netty's ByteBufs) from underlying network layer in pipeline handler object like this:

class Handler {
    ...
    private SAXParser parser = ...;
    private ContentHandler handler = ...;
    void process(byte[] request) {
        parser.parse(???, handler);
    }
}

Handler.process() is called multiple times per request (as the data arrives from network). How can I feed data to parser without buffering requests into single huge data unit?

Oroboros102
  • 2,214
  • 1
  • 27
  • 41

2 Answers2

3

Use new ByteArrayInputStream(request).

kan
  • 28,279
  • 7
  • 71
  • 101
  • Would you be so kind to provide an example of using `ByteArrayInputStream` with `SAXParser` and `Handler.process()`? – Oroboros102 Aug 20 '14 at 08:29
  • I'm asking for example, because `parser.parse(new ByteArrayInputStream(request), handler);` feeds only part of the document to the parser. – Oroboros102 Aug 20 '14 at 08:53
  • 1
    @Oroboros102 Unfortunately as the parser is not streaming and requires to pull data bytes, you should use multithreading. One thread does parsing, another thread feeds byte[] chunks into the parser thread. And you somehow should detect end of XML document in the network stream - i.e. split continuous stream of bytes in data packets. Do you need a sample code? – kan Aug 20 '14 at 09:50
  • thanks for answering. May **StAX** parser be a solution and parse XML in same eventloop? – Oroboros102 Aug 20 '14 at 10:33
  • 1
    @Oroboros102 No, StAX parser also expects InputStream to pull data from. I am not aware of any library which may parse XML from byte stream pushed into it. I've tried to create such library couple years ago, but abandoned the project... maybe I should resume it... – kan Aug 20 '14 at 12:19
  • Searched for "asynchronous xml parser" and found aalto project. I'll give it a try. Thanks again for clearing that all. I thought, I was just doing it wrong and didn't even understood the whole problem. – Oroboros102 Aug 20 '14 at 12:46
0

Almost all XML parsers assumes that the source always gives the bytes it wants when it asks. When there are not enough number of bytes in the source, it expects the source to block until it has the bytes.

This design conflicts with the non-blocking source, such as Netty channel.

To work around this impedence mismatch, you need to ensure that your ByteBuf contains a complete XML document. You can ensure that using XmlFrameDecoder. Once XmlFrameDecoder produces a ByteBuf with a complete XML document, you can feed it to your favorite parser by wrapping the buffer with ByteBufInputStream. For example:

InputStream in = new ByteBufInputStream(buf);
parser.parse(in, handler);
trustin
  • 12,231
  • 6
  • 42
  • 52
  • To get full xml document, I can just use `HttpObjectAggregator` in my case. Thing is — I'm trying to avoid buffering whole xml. Because it may be really huge. – Oroboros102 Aug 21 '14 at 10:53
  • 1
    Then you need an XML parser that works with non-blocking data source, such as Aalto: https://github.com/FasterXML/aalto-xml – trustin Aug 21 '14 at 23:51
  • Thanks, that's what I'm using right now. And it works! – Oroboros102 Aug 22 '14 at 10:33