1

We have a Spring MVC web app (WAR) deploying to Tomcat (6.0.35) that launches a thread inside a separate JVM at deploy time (don't ask why - not my design) and then communicates with that thread via RMI over port 8888.

Despite being totally convoluded, this was working perfectly fine up until yesterday, and now the thread is failing at startup and despite our best efforts to add logging into the mix, we are hitting a wall. This is the only exception we are able to find in the logs:

Jun 12, 2012 3:11:36 AM com.ourapp.ImageController destroy
SEVERE: Shutdown Error: Lookup of RMI stub failed; nested exception is      java.rmi.ConnectException: Connection refused to host: localhost; nested exception is:
    java.net.ConnectException: Connection refused
Jun 12, 2012 3:11:37 AM org.apache.catalina.core.StandardContext listenerStop
SEVERE: Exception sending context destroyed event to listener instance of class    org.springframework.web.context.ContextLoaderListener
java.lang.NoClassDefFoundError: org/springframework/web/context/ContextCleanupListener
    at org.springframework.web.context.ContextLoaderListener.contextDestroyed(ContextLoaderListener.java:80)
    at org.apache.catalina.core.StandardContext.listenerStop(StandardContext.java:3973)
    at org.apache.catalina.core.StandardContext.stop(StandardContext.java:4577)
    at org.apache.catalina.startup.HostConfig.checkResources(HostConfig.java:1165)
    at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1271)
    at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:296)
    at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
    at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1337)
    at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1601)
    at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1610)
    at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1590)
    at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: org.springframework.web.context.ContextCleanupListener
    at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1387)
    at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1233)
    ... 12 more

The ImageController is the Spring MVC Controller that is responsible for kicking off this daemon/spawned RMI thread. Based on the verbage of this error, does anybody have any idea what might be causing this "connection refused" error?

Running a netstat -an | grep 8888 (this is a Linux machine) produces no output which means nothing is listening on that port. Thanks in advance for any ideas/suggestions that lead to a fix.

Edit: Here's another ConnectionException we're seeing:

Caused by: java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
    at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
    at java.net.Socket.connect(Socket.java:529)
    at java.net.Socket.connect(Socket.java:478)
    at java.net.Socket.<init>(Socket.java:375)
    at java.net.Socket.<init>(Socket.java:189)
    at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22)
    at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128)
    at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595)
    ... 74 more
Anuj Patel
  • 17,261
  • 3
  • 30
  • 57
IAmYourFaja
  • 55,468
  • 181
  • 466
  • 756
  • Are you sure the port/socket you are trying to connect to is not blocked or closed? – Anuj Patel Jun 12 '12 at 10:36
  • 1
    These logs seem to be from the webapp JVM where it basically complains that it couldn't connect through RMI. You need to be investigating the other service's logs and why it didn't start/ not listening on 8888. – maksimov Jun 12 '12 at 10:37
  • @indyaah - I'm not real familiar with linux - what commands could I use to tell if port 8888 is blocked/closed? – IAmYourFaja Jun 12 '12 at 10:40
  • @maksimov - can you just clarify what you mean by "other service" - i.e., do I need to investigate why the spawned thread did not start? – IAmYourFaja Jun 12 '12 at 10:41
  • @4herpsand7derpsago yes. you need to investigate your "separate JVM" where it should listen on 8888 and it doesn't. – maksimov Jun 12 '12 at 10:47
  • @maksimov - this server is hosted by a separate systems team and this is actually a shared QA machine which means its constantly changing. Is it possible that systems or some other process has blocked or closed the port, and if so, how do I check for this (in a manner that is different from the `netstat` I refer to in my question)? Thanks again for all your help so far. – IAmYourFaja Jun 12 '12 at 10:53
  • Your `netstat` output clearly suggests the application is not listening on that port. I'll also see if the actual application is running (`ps ax |grep java` or similar). That shared QA machine is the machine where I'd be looking to resolve the problem. – maksimov Jun 12 '12 at 10:57
  • @maksimov - thanks again and last question for you: I need to begin ruling things out. I have a suspicion that some external process is blocking/closing that port. Do you know (or anybody else) what tools I can use, or what commands I can run to determine if a port is being blocked or forcefully closed? – IAmYourFaja Jun 12 '12 at 11:00
  • If both client and server are on the same machine, then typically there's nothing that can block application from talking to port 8888 or otherwise forcefully close it. – maksimov Jun 12 '12 at 11:08
  • It's not possible for someone in systems to accidentally block/reserve a port? – IAmYourFaja Jun 12 '12 at 11:09
  • I'm not aware if this is possible. I doubt it is. And I doubt someone will - out of the blue - just come in and do it for no reason, even if it were possible. – maksimov Jun 12 '12 at 11:11

2 Answers2

2

I'll move my comments to this answer.

Everything in your logs suggests that the problem is on your other machine where you connect to on port 8888. The netstat results prove that there's no process listening on that port - hence connection errors in your logs.

Next step I would take is check whether the application on the other machine (or in other JVM) is running at all (e.g. ps ax|grep java). And if it does - check its logs for clues why it failed to start the RMI service, or if it doesn't - start it.

Typically if an application can not open a socket, it's due to the fact that some other application is already using the port. It's not the case in your situation. It could be a case that two instances of the same application have been started: first application took 8888 port, second application couldn't take 8888 and failed, and then first application was shut down. After all you end up with nobody listening on the port.

maksimov
  • 5,792
  • 1
  • 30
  • 38
  • Both the Tomcat web app and this other thread are living on the same server machine and both output to the same log files. – IAmYourFaja Jun 12 '12 at 11:03
  • It seems to me that I need to rule out if there is something at the systems level blocking/closing the port - if that's impossible please explain why! But that's the only thing that makes sense seeing that no code changes propagated yesterday... – IAmYourFaja Jun 12 '12 at 11:04
  • Now your edit makes sense! Our buildscript does some weird things and restarts Tomcat as one of its build steps (instead of hot-deploying the WAR). Is it possible to release a port if a failed app is "using" (hogging) it? – IAmYourFaja Jun 12 '12 at 11:07
  • No, port is "released" if netstat doesn't see it listening. I'd say it was a momentary clash of two instances. Try a cleaner restart and see what happens. – maksimov Jun 12 '12 at 11:09
  • when you say a cleaner restart can you be more specific? I'm not trying to be a pain here, just exhausted from spending the last 6 hours looking at the same problem and making very little headway... – IAmYourFaja Jun 12 '12 at 11:23
  • By restart I meant application restart. If your application(s) start only on deployment, then try a clean redeployment and see if this resolves it. – maksimov Jun 12 '12 at 11:25
  • Yes I've restarted several times already, making sure to kill -9 all Java PIDs beforehand. I like your theory here and where it is going, but it is disconcerting to me that this is just happening all of the sudden. This never used to happen, which is why I keep thinking this is something at the systems level. – IAmYourFaja Jun 12 '12 at 11:27
  • Can you try and make sure the code that opens 8888 is printing something in the logs in case of a failure? I'd love to help you but I can only rely on my imagination here, so logs would assist. – maksimov Jun 12 '12 at 11:29
0

does anybody have any idea what might be causing this "connection refused" error?

It's right there in the stack trace:

java.lang.NoClassDefFoundError: org/springframework/web/context/ContextCleanupListener

Your deployment is missing that class, i.e. the JAR file that it comes in.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • Not sure if this is it. Basically `ContextLoaderListener` and `ContextCleanupListener` should be in the same package of the same JAR - `spring-web-x.x.x.x.jar`, so this error is weird, but doubtfully a cause of this problem. – maksimov Jun 12 '12 at 10:49
  • I agree with maksimov - like I said *nothing* has changed since yesterday (no deployments, no builds, nothing) so the classpath has not changed. This just stopped working. – IAmYourFaja Jun 12 '12 at 10:51
  • @maksimov If this isn't it why is that exception being thrown? and why does it appear in the stack trace *below* the 'connection refused' exception? – user207421 Jun 12 '12 at 11:00
  • @EJP I think the cause of this particular issue is that the application was in advanced stages of a shutdown, and who knows what happened to class loader at that particular point. The fact is this error has nothing to do with network, and it should not happen in normal operation, since all evidence points to the fact that the classes *must* see each other normally. – maksimov Jun 12 '12 at 11:13