2

There is a Java Cocoon application that performs XSLT transformations using Saxon 8.7. One such xslt uses document function to inject the contents of the remote xml resource. So the invocation looks like the following: <xsl:apply-templates select="document(@href)/p-topic" mode="static-topic"/>

The remote document is accessible (tested with wget), no proxy is used for that remote host. However, I'm getting the following exception stack trace:

Caused by: org.apache.commons.lang.exception.NestableRuntimeException: net.sf.saxon.trans.DynamicError: net.sf.saxon.trans.DynamicError: java.net.ConnectException: Connection timed out
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.Socket.connect(Socket.java:519)
        at java.net.Socket.connect(Socket.java:469)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:388)

The time out happens after about 20 seconds.

Finally, the issue has appeared after moving the application to another server. On the initial server the same code works fine. So it depends on the environment.

Also I've analyzed netstat output. The following connections appear on XSLT transformation being run:

Proto Recv-Q Send-Q Local Address               Foreign Address             State       User       Inode      PID/Program name
tcp        0      1 ::ffff:134.27.100.67:37600  ::ffff:134.27.97.142:8510   SYN_SENT    22484/java
tcp     8559      0 ::ffff:134.27.100.67:55835  ::ffff:134.27.97.143:80     ESTABLISHED  22484/java

134.27.97.143:80 is the target remote xml resource location. I have no idea why SYN_SENT connection appears to another server.

After about 5 seconds the second connection changes to the following:

tcp     8560      0 ::ffff:134.27.100.67:55835  ::ffff:134.27.97.143:80     CLOSE_WAIT  22484/java

After about 15 seconds more (the browser times out now) the first connection disappears and the second connection changes to the following:

tcp     0      0 ::ffff:134.27.100.67:55835  ::ffff:134.27.97.143:80     CLOSE_WAIT  22484/java

After about 5 seconds more the second connection also disappears. I'm not a netstat expert but it seems suspicious that the Recv-Q value keeps non-zero value until the timeout happens. So it looks like the application hangs while reading the data from the TCP socket queue. I've tried different tomcats (5 and 6). Any ideas?

lagivan
  • 2,689
  • 22
  • 30
  • It looks related to [this topic](http://serverfault.com/questions/525929/tomcat-not-getting-data-from-tcp-recv-q-hanging). But there is a lot of free memory left, so the root cause must be different. – lagivan Oct 23 '13 at 11:20
  • You told that it is caused by migration. Could you please provide more details about the technical changes? – shapiy Oct 23 '13 at 11:37
  • Sure. The old server is running under RHEL 3.9, the new one is under RHEL 5.6. All the rest is the same (I've tried the same JDK and Tomcat). Two servers are in different security zones in the datacenter so there might be some accessibility differences due to different firewalls. However, I don't think it should matter because I can download the remote xml resource from the new server with wget. – lagivan Oct 23 '13 at 12:05
  • Sorry, can't help much with this. Saxon of course is just using standard Java library stuff like URL.openConnection(). – Michael Kay Oct 24 '13 at 21:16
  • Does the timeout occur always or intermittently? – kjhughes Oct 27 '13 at 01:49
  • Always. I've come to a conclusion it must be something with server configuration, so I've involved technical support. – lagivan Oct 28 '13 at 14:13
  • I wouldn't count out the security issue. The fact that you can have access using "wget" called by the system does not mean you will have access through Java. Did you try disabling all security and test? – OmegaZiv Oct 28 '13 at 15:02
  • Sorry, what security do you mean? System level? – lagivan Oct 29 '13 at 13:24
  • Firewall, for example, might allow wget connection but prevent Java. I see the issue was a access to the DTD. But why did you fail to download the DTD? – OmegaZiv Nov 10 '13 at 14:14

1 Answers1

0

Finally I've managed to identify the root cause. It's appeared that those remote documents have DOCTYPE declarations of DTD files. Those DTD files are not accessible from the new server while they are accessible from the old server. So it seems Saxon tries to download the DTDs for validation and fails to do it with "Connection timed out" exception.

lagivan
  • 2,689
  • 22
  • 30