10

What is the best way to relaunch the application where it was running a listening TCP port? Problem is: if i quickly launch the application as relaunch it fails because the socket which was listening is already in use.

How to safely relaunch in such case?

socket.error: [Errno 98] Address already in use

Code:

#!/usr/bin/python
import sys,os
import pygtk, gtk, gobject
import socket, datetime, threading
import ConfigParser
import urllib2
import subprocess

def server(host, port):
  sock = socket.socket()
  sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
  sock.bind((host, port))
  sock.listen(1)
  print "Listening... " 
  gobject.io_add_watch(sock, gobject.IO_IN, listener)


def listener(sock, *args):
  conn, addr = sock.accept()
  print "Connected"
  gobject.io_add_watch(conn, gobject.IO_IN, handler)
  return True

def handler(conn, *args):
  line = conn.recv(4096)
  if not len(line):
    print "Connection closed."
    return False
  else:
    print line
    if line.startswith("unittest"):
      subprocess.call("/var/tmp/runme.sh", shell=True)
    else:
      print "not ok"
  return True

server('localhost', 8080)
gobject.MainLoop().run()

runme.sh

#!/bin/bash
ps aux | grep py.py | awk '{print $2}' | xargs kill -9;
export DISPLAY=:0.0 && lsof -i tcp:58888 | grep LISTEN | awk '{print $2}' | xargs kill -9;
export DISPLAY=:0.0 && java -cp Something.jar System.V &
export DISPLAY=:0.0 && /var/tmp/py.py &

EDIT: Note that, i am using Java and Python together as one application with two layer. So runme.sh is my startup script to launch both apps at same time. From Java i press the Python relaunch button. But Python does not relaunch because the kill is done via BASH.

  • 1
    So did you figure out why your code wasn't setting `SO_REUSEADDR`? – Matthew Adams Dec 13 '12 at 04:01
  • @MatthewAdams: not yet. still it fails. –  Dec 13 '12 at 05:09
  • 1
    I've looked at a ton of other questions about this same issue now, and it seems like EJP is totally right about `SO_REUSEADDR`. I still don't see why your code can't immediately reconnect since it looks like you are setting `SO_REUSEADDR`... – Matthew Adams Dec 13 '12 at 22:18
  • 1
    I think it has got to be something with `gobject`'s io monitoring... – Matthew Adams Dec 14 '12 at 18:58
  • can you pass the file-descriptor of the socket to the other, new process? – User Dec 18 '12 at 13:30
  • Are you saying that `runme.sh` is executed from 2 different contexts in your startup? once manually to start everything, and then again from java? that is eventually the "call stack" is runme-java-runme-java-runme-java-...? – Dima Tisnek Dec 22 '12 at 12:28

7 Answers7

3

You will have to find the Python equivalent of setting SO_REUSEADDR on the socket before you bind it. Ensuring the socket is closed on exit as recommended in other answers is neither necessary nor sufficient, as (a) sockets get closed by the OS when the process exits, and (b) you still have to overcome accepted connections in the TIME_WAIT state, which only SO_REUSEADDR can do.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • 1
    You definitely want to close the socket thought, right? That has always solved the problem for me in the past and it just seems like good practice... – Matthew Adams Dec 10 '12 at 21:02
  • 1
    @MatthewAdams The socket gets closed by the OS when the process exits. By all means close it in your normal code, but there's no need to go to the heroic lengths outlined in your answer. – user207421 Dec 11 '12 at 07:30
  • 1
    (+1) Ah. Although to be fair, the "heroic lengths" was just three lines of code plus using a different `kill` flag... – Matthew Adams Dec 12 '12 at 19:04
  • 1
    Although looking at [the docs](http://docs.python.org/2/library/socket.html#socket.AF_INET) (scroll all the way to the bottom), it seems like @YumYumYum is setting `SO_REUSEADDR` with this line: `sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)` – Matthew Adams Dec 12 '12 at 19:30
2

1.

You have a problem killing your python

air:~ dima$ ps aux | grep i-dont-exist.py | awk '{print $2}'
34198

Which means that your grep process gets caught up in and killed by your restart logic.

On linux you could use pidof instead.

Alternatively use start-stop-daemon and pid file.

2.

You already reuse address, so my guess is your python doesn't die fast enough.

For a quick test, add a sleep before you start python again.

If this helps, add a sleep-wait loop after kill command and only start new python when you are sure old python is not running anymore.

Dima Tisnek
  • 11,241
  • 4
  • 68
  • 120
2

Is there any chance your Python program spawns other processes? e.g. through fork, subprocess or os.system?

It is possible that your listening file descriptor is inherited by the spawned process:

os.system("sleep 1000") # without sockets:

ls -l /proc/`pidof sleep`/fd
total 0
lrwx------ 1 user user 64 2012-12-19 19:52 0 -> /dev/pts/0
lrwx------ 1 user user 64 2012-12-19 19:52 1 -> /dev/pts/0
l-wx------ 1 user user 64 2012-12-19 19:52 13 -> /dev/null
lrwx------ 1 user user 64 2012-12-19 19:52 2 -> /dev/pts/0

socket(); setsockopt(); bind(); listen(); os.system("sleep 1000") # with sockets:

ls -l /proc/`pidof sleep`/fd
total 0
lrwx------ 1 user user 64 2012-12-19 19:49 0 -> /dev/pts/0
lrwx------ 1 user user 64 2012-12-19 19:49 1 -> /dev/pts/0
l-wx------ 1 user user 64 2012-12-19 19:49 13 -> /dev/null
lrwx------ 1 user user 64 2012-12-19 19:49 2 -> /dev/pts/0
lrwx------ 1 user user 64 2012-12-19 19:49 5 -> socket:[238967]
lrwx------ 1 user user 64 2012-12-19 19:49 6 -> socket:[238969]

Perhaps your Python script died, but its children did not, the latter keep reference to listening socket and thus new Python process cannot bind to same address.

Dima Tisnek
  • 11,241
  • 4
  • 68
  • 120
1

Here is my guess: kill is asynchronous. It just tells the kernel to send a signal to the process, it does not also wait for the signal to be delivered and handled. Before restarting the process you should use the 'wait' command.

$ wait $PID
alexp
  • 811
  • 4
  • 10
1

Possible solution #1: Fork and exec the new copy of your python script from the old one. It will inherit the listening socket. Then, if desired, detach it from the parent and kill (or exit) the parent. Note that the parent (old version) can finish servicing any existing requests even as the child (new version) handles any new incoming requests.

Possible solution #2: Signal the old running script to hand over the socket to the new script with sendmsg() and SCM_RIGHTS, then kill the old script. This sample code talks about "file descriptors" but works fine with sockets too. See: How to hand-over a TCP listening socket with minimal downtime?

Possible solution #3: If bind() returns EADDRINUSE, wait for a little while and retry until it succeeds. If you need to restart the script quickly and with no downtime in between, this won't work, of course :)

Possible solution #4: Don't kill your process with kill -9. Kill it with some other signal instead, for example SIGTERM. Catch SIGTERM and call gobject.MainLoop.quit() when you get that.

Possible solution #5: Make sure the parent process of your python script (for example the shell) waits on it. If the parent process of the script is not running, or if the script is daemonized, then if killed with SIGKILL, init will become its parent. init calls wait periodically but it may take a bit of time, this is probably what you're running into. If you must use SIGKILL but you want faster cleanup just call wait yourself.

Solutions 4 & 5 have some very short but nonzero time in between stopping the old script and starting the new. Solution 3 has potentially significant time in between, but is very simple. Solutions 1 & 2 are ways to do this with literally no downtime: any connect call will succeed and get either the old or the new running script.

P.S. More detail on the behavior of SO_REUSEADDR on different platforms: SO_REUSEADDR doesn't have the same semantics on Windows as on Unix

On Windows, however, that option actually means something quite different. It means that the address should be stolen from any process which happens to be using it at the moment.

I'm not sure if this is what you're running into, but note that as described there the behavior on different versions of Unix is also somewhat different.

Community
  • 1
  • 1
Alex I
  • 19,689
  • 9
  • 86
  • 158
1

You can add more logic to your startup script to do pre-execution testing and cleanup.

#!/bin/bash
export DISPLAY=:0.0

# If py.py is found running
if pgrep py.py; then
 for n in $(seq 1 9); do
  # kill py.py starting at kill -1 and increase to kill -9
  if ! pgrep py.py; then
   # if no running py.py is found break out of this loop
   break
  fi
  pkill -${n} py.py
  sleep .5
 done
fi

# Verify nothing has tcp/58888 open in a listening state
if lsof -t -i tcp:58888 -stcp:listen; then
 echo process with pid $(lsof -t -i tcp:58888 -stcp:listen) still listening on port 58888, exiting
 exit
fi

java -cp Something.jar System.V &
/var/tmp/py.py &

Eventually you'll probably want to use a full blown init script and have those processes daemonized. See http://www.thegeekstuff.com/2012/03/lsbinit-script/ for an example, though if your processes are running as an unprivleged user that will change the implementation slightly, but the overall concepts are the same.

Preston
  • 199
  • 1
  • 8
0

What-ever i tried does not worked. So to reduce the risk i started to use file system as socket example:

# Echo server program
import socket,os

s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
try:
    os.remove("/tmp/socketname")
except OSError:
    pass
s.bind("/tmp/socketname")
s.listen(1)
conn, addr = s.accept()
while 1:
    data = conn.recv(1024)
    if not data: break
    conn.send(data)
conn.close()


# Echo client program
import socket

s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.connect("/tmp/socketname")
s.send('Hello, world')
data = s.recv(1024)
s.close()
print 'Received', repr(data)