1

My java application can sometimes be killed by an external script. This can be done either with SIGTERM or with SIGKILL.

The application is a server which receives many connections per second, and it can be killed while trying to serve them.

I would like to restart the application whenever it's killed, so I have prepared a script for that purpose.

The problem is that, once the app has been killed, the new application instance can't bind to the port used by the previous instance, because the "Address is already in use". The previous instance's process has been definitely terminated, anyway the offending listening port is still there, but it is assigned to bash (or sh on other machines).

Obviouly, my goal is to restart the application and let it bind successfully to the previous address.

I've tried waiting more than 200 seconds before restarting to no avail, anyway I can't afford to wait that much.

I've encountered this problem on all the machines I've ran the application (which is a jetty server with java 1.6).

Any suggestion is appreciated, thanks,

Silvio

EDIT Killing the jvm process is not the normal way I exit my application, this is used in case of problems (OutOfMemoryErrors) only. And I never need to kill it with SIGKILL, because SIGTERM always suffices, I resort to SIGKILL only in case SIGTERM fails, which has never happened. I'm working on a long term solution, meanwhile I have to keep my app running by applying stitches here and there.

EDIT To be more clear: this is the netstat -tunap | grep line I see before killing the process:

tcp6       0      0 :::8898       :::*        LISTEN      22709/java

and this is after killing the process

tcp6       0      0 :::8898       :::*        LISTEN      23665/sh

notice that the process with PID 22709 is killed and gone, but the port is still there (but locked by sh)

UPDATE after I kill my application, with netstat I can see a long list of pending connections in CLOSE_WAIT state, with my ip as destination. Also, I can see a sh process in state LISTEN listening on my port: when I kill it, a sleep process replaces it and listens on the same port: When I finally kill this sleep process, the port is released and I can restart successfully my server. That could be a solution to get my port released, but I fear that automatically killing processes in order to release a port is a bit risky

Silvio Donnini
  • 517
  • 1
  • 7
  • 9

5 Answers5

3

The server still expects some packets from the clients after the listening sockets are closed and keeps the port assigned. The application may use SO_REUSEADDR socket option to allow immediate socket address reuse.

Here is an excerpt from my Linux ip(7) manual page:

A TCP local socket address that has been bound is unavailable for some time after closing, unless the SO_REUSEADDR flag has been set. Care should be taken when using this flag as it makes TCP less reliable.

The application or application server might have a configuration setting for using this socket option.

Jacek Konieczny
  • 3,777
  • 2
  • 23
  • 22
  • hmm, please take special note of *Care should be taken when using this flag as it makes TCP less reliable* Sysadmin mantra: don't solve one problem, by breaking something else. – The Unix Janitor Mar 18 '10 at 15:08
  • In this case it's fine. A perusal of `netstat` output would almost certainly show that the socket is in TIME_WAIT, which would only matter if there was still a chance that something useful could arrive on the socket. Since the application that would handle that info is already dead, it doesn't matter if it's reused. – Insyte Mar 18 '10 at 15:44
  • I added the netstat output just now, I should have included it right from the start. It seems not to be in TIME_WAIT state, but always in LISTEN – Silvio Donnini Mar 18 '10 at 15:52
1

Your not actually killing your java application, your actually killing your java virtual machine (jvm) instance which is in turn running your java application.

This is not the idea way of terminating your java process.

if your having to kill your jvm with kill -9 , the jvm wont be able clear up after itself thus leaving operating resources in limbo. :-(

Add some functionality to your app to make it exit gracefully. If you have no choice, then try to kill you jvm with -15 , it may help it clear up after itself.

If your java program really is hanging the jvm, then you need to get a debugger and squash those pests.

Killing a process and restarting it is a hack, but's not fix. You should only use SIGKILL if a process is not responding any other method.

I usually try

kill -15

then only kill -9 as a last resort.

and for fun...

http://www.youtube.com/watch?v=Fow7iUaKrq4

The Unix Janitor
  • 2,458
  • 15
  • 13
  • Yes, but that is not the way I exit my app normally. This is in case an emergency situation occurs (i.e. an OutOfMemoryError), and I first try to kill it with SIGTERM, which always succeeds. Should it ever kill -15 fail I resort to calling it with -9, but that has never been necessary – Silvio Donnini Mar 18 '10 at 15:32
1

Since you only do this manually, you may have to add another check.

netstat -p

and kill the pid associated with your open socket, even if it is bash or sh.

Also, you mentioned that most of the time SIGTERM works. If that's the case, your app should catch the SIGTERM and jump into some graceful exit code that RSTs all open connections and then closes the socket.

HTH

Scott Lundberg
  • 2,364
  • 2
  • 14
  • 22
  • That could be a way, unfortunately my app won't be able to catch any signal. The reason I need to kill it is because, after throwing an OutOfMemoryError, the jvm basically becomes unstable and not able to execute code correctly – Silvio Donnini Mar 18 '10 at 15:59
  • I'm going to try killing every process that gets ahold of the listening socket, just to see if that's the problem – Silvio Donnini Mar 18 '10 at 16:21
  • @Silvio - just a guess here... I think because you are starting your jvm app from the shell, it's showing as the master process. (You could verify with a pstree when the java app is running.), when the child dies, the socket gets inherited by the sh process... – Scott Lundberg Mar 18 '10 at 16:41
  • That would make sense, but I checked with pstree as you said, and it's a direct child of init not sh – Silvio Donnini Mar 18 '10 at 16:56
  • @Silvio. Not sure how the sh process is getting the socket then... Are you doing any kind of c'ish type coding with forking where the pid is transferred? – Scott Lundberg Mar 18 '10 at 17:18
0

If you have access to the source code, you need to create the socket with the SO_REUSEADDR option mentioned by Jacek. Also of interest are the tcp_tw_recycle and tcp_tw_reuse kernel flags (on Linux).

The real problem is in the protocol design, which you may or may not be able to change. Interesting threads on the topic:

Insyte
  • 9,394
  • 3
  • 28
  • 45
0

With your update I have another explanation. The sh process keeping the socket open must be a child of your application, forked after the listening socket has been opened. It didn't die with its parent and was adopted by the init process.

You should try to find out what is that shell process for (probably some script started by your application) and why it is not terminating. Maybe it will be enough to fix the script so it terminates after finishing it job will be enough? Or there is a way to make it not detach from the parent (it should die with the parent if is a part of the same process group) or make it to close all the unneeded file descriptors inherited from the parent.

You may try:

fuser -p $pid_of_the_sh_process

to see what other files it keeps opened. One of these will be, most probably, the shell script. Knowing what it is we may find a way to fix the problem.

Jacek Konieczny
  • 3,777
  • 2
  • 23
  • 22