-1

Is there a way I can parse an Xml Document from a Socket InputStream without closing the stream on the client side? I only have control the the Server side receiving the Xml and the socket will remain open since the server will be sending a response back to the client.

Can I tell it to stop and return the Document when it finds the root element closing tag, I'd need to modify the parser wouldn't I? Why would it even bother to parse further since having multiple root elements in a Document would make it not well-formed? It keeps parsing after the end element because it's checking for trailing comments or processing instructions, which I do not care about in my case and would ignore them.

The Xml I send is well-formed and is properly parsed from a FileInputStream, since it has a clear EOF, but hangs when being parsed from a Socket InputStream that does not close.

The client does not close the stream after sending the Xml because they expect a response over the socket.

Here is my code:

try (
    ServerSocket server = new ServerSocket(port);
    Socket sock = server.accept();
    InputStream in = sock.getInputStream(); ) {

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    DocumentBuilder db = dbf.newDocumentBuilder();
    db.setErrorHandler(MyErrorHandler);
    db.setEntityResolver(MyEntityResolver);
    // below hangs, waiting for stream to close I think
    Document doc = db.parse(in);

    // .. process document
    // .. send response
}   

Here is the stack trace of where it is hanging:

SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) line: not available [native method]    
SocketInputStream.socketRead(FileDescriptor, byte[], int, int, int) line: 116   
SocketInputStream.read(byte[], int, int, int) line: 171 
SocketInputStream.read(byte[], int, int) line: 141  
XMLEntityManager$RewindableInputStream.read(byte[], int, int) line: 2919    
UTF8Reader.read(char[], int, int) line: 302 
XMLEntityScanner.load(int, boolean, boolean) line: 1895 
XMLEntityScanner.skipSpaces() line: 1685    
XMLDocumentScannerImpl$TrailingMiscDriver.next() line: 1371 
XMLDocumentScannerImpl.next() line: 602 
XMLDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) line: 505  
XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: 841   
XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) line: 770    
DOMParser(XMLParser).parse(XMLInputSource) line: 141    
DOMParser.parse(InputSource) line: 243  
DocumentBuilderImpl.parse(InputSource) line: 339    
DocumentBuilderImpl(DocumentBuilder).parse(InputStream) line: 121   

Thanks for any suggestions.

xtratic
  • 4,600
  • 2
  • 14
  • 32
  • 1
    If the stream stays open, how would you know when you've received the full XML document? The client needs to tell you, either by closing the stream, or by telling you the length up-front. – Andreas Sep 19 '18 at 19:33
  • I was hoping it would know that the Xml document is finished when it reads the closing tag of the root element. – xtratic Sep 19 '18 at 21:12

2 Answers2

1

If the stream is small enough to fit in memory, you might as well read the bytes in a byte array. If it is large and you want to work with streams, have a look at Apache Commons IOUtils which effectively give you ways to copy an InputStream to an OutputStream and handle it later. This way the socket stream should remain open.

  • @Andreas Precisely. I don't know when the full Xml document has been sent until I see the root element closing tag. Currently I'm trying to do some naive manual parsing to do this but things could get sketchy with CDATA and I'd love to just use an existing Xml parser. – xtratic Sep 19 '18 at 21:15
  • 2
    ok, I think I got it now, you have a socket where you get XMLs back-to-back and you need to know when you are good to go for parsing. there was a Xerces sample back in the day describing a solution - have a look at: https://svn.apache.org/repos/asf/xerces/java/trunk/samples/socket/KeepSocketOpen.java . It uses a WrappedInputStream approach at the server side to make the XMLs appear separate, although coming in the same stream (implies you have access to the server code). – Ioannis Baourdos Sep 19 '18 at 22:22
  • @IoannisBaourdos Unfortunately it seems like this solution requires I have control of both server and client. However, I only have control of the server receiving the Xml. – xtratic Sep 20 '18 at 19:03
0

I've unaccepted my answer as I no longer trust XmlFrameDecoder since it looks like it's XML tracking is somewhat too naive.. What is truly needed is to find an XML Parser which has the option to return the Document after the closing element tag and ignore trailing miscellaneous data...


I think I've realized a good solution and figured I'll share for anyone else in a similar boat.

Rather than using a raw Socket I would use Netty to build my Socket protocol and use an XmlFrameDecoder to frame the messages and parse the bytes of that frame into a Document.

public class Main {
    private static class MyXmlHandler extends ChannelInboundHandlerAdapter {

        @Override
        public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
            try (InputStream in = new ByteBufInputStream((ByteBuf) msg, true)) {
                Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in);

                // prove that we got the document
                Transformer transformer = TransformerFactory.newInstance().newTransformer();
                transformer.setOutputProperty(OutputKeys.INDENT, "yes");
                StringWriter writer = new StringWriter();
                transformer.transform(new DOMSource(doc), new StreamResult(writer));
            }
        }
    }


    public static void main(String[] args) throws InterruptedException {
        final int PORT = 8080;

        EventLoopGroup parentGroup = new NioEventLoopGroup();
        EventLoopGroup childGroup = new NioEventLoopGroup();
        try {
            ServerBootstrap server = new ServerBootstrap();
            server.group(parentGroup, childGroup).channel(NioServerSocketChannel.class)
                    .childHandler(new ChannelInitializer<SocketChannel>() {

                        @Override
                        public void initChannel(SocketChannel ch) throws Exception {
                            ch.pipeline().addLast(new XmlFrameDecoder(Integer.MAX_VALUE),
                                    new MyXmlHandler());
                        }
                    }).childOption(ChannelOption.SO_KEEPALIVE, true);

            ChannelFuture channel = server.bind(PORT).sync();
            channel.channel().closeFuture().sync();
        } finally {
            childGroup.shutdownGracefully();
            parentGroup.shutdownGracefully();
        }
    }
}
xtratic
  • 4,600
  • 2
  • 14
  • 32