1

I'm trying to set up a remote backup server for dar, along these lines. I'd really like to do all the piping with python if possible, but I've asked a separate question about that.

Using netcat in subprocess.Popen(cmd, shell=True), I succeeded in making a differential backup, as in the examples on the dar site. The only two problems are:

  1. I don't know how to assign port numbers dynamically this way
  2. If I execute the server in the background, it fails. Why?

Update: This doesn't seem to be related to netcat; it hangs even without netcat in the mix.

Here's my code:

from socket import socket, AF_INET, SOCK_STREAM
import os, sys
import SocketServer
import subprocess

class DarHandler(SocketServer.BaseRequestHandler):
    def handle(self):
        print('entering handler')
        data = self.request.recv(1024).strip()
        print('got: ' + data)
        if data == 'xform':
            cmd1 = 'nc -dl 41201 | dar_slave archives/remotehost | nc -l 41202'
            print(cmd1)
            cmd2 = 'nc -dl 41200 | dar_xform -s 10k - archives/diffbackup'
            print(cmd2)
            proc1 = subprocess.Popen(cmd1, shell=True)
            proc2 = subprocess.Popen(cmd2, shell=True)
            print('sending port number')
            self.request.send('41200')
            print('waiting')
            result = str(proc1.wait())
            print('nc-dar_slave-nc returned ' + result)
            result = str(proc2.wait())
            print('nc-dar_xform returned ' + result)
        else:
            result = 'bad request'
        self.request.send(result)
        print('send result, exiting handler')

myaddress = ('localhost', 18010)
def server():
    server = SocketServer.TCPServer(myaddress, DarHandler)
    print('listening')
    server.serve_forever()

def client():
    sock = socket(AF_INET, SOCK_STREAM)
    print('connecting')
    sock.connect(('localhost', 18010))
    print('connected, sending request')
    sock.send('xform')
    print('waiting for response')
    port = sock.recv(1024)
    print('got: ' + port)
    try:
        os.unlink('toslave')
    except:
        pass
    os.mkfifo('toslave')
    cmd1 = 'nc -w3 localhost 41201 < toslave'
    cmd2 = 'nc -w3 localhost 41202 | dar -B config/test.dcf -A - -o toslave -c - | nc -w3 localhost ' + port
    print(cmd2)
    proc1 = subprocess.Popen(cmd1, shell=True)
    proc2 = subprocess.Popen(cmd2, shell=True)
    print('waiting')
    result2 = proc2.wait()
    result1 = proc1.wait()
    print('nc<fifo returned: ' + str(result1))
    print('nc-dar-nc returned: ' + str(result2))
    result = sock.recv(1024)
    print('received: ' + result)
    sock.close()
    print('socket closed, exiting')

if __name__ == "__main__":
    if sys.argv[1].startswith('serv'):
        server()
    else:
        client()

Here's what happens on the server:

$ python clientserver.py serve &
[1] 4651
$ listening
entering handler
got: xform
nc -dl 41201 | dar_slave archives/remotehost | nc -l 41202
nc -dl 41200 | dar_xform -s 10k - archives/diffbackup
sending port number
waiting

[1]+  Stopped                 python clientserver.py serve

Here's what happens on the client:

$ python clientserver.py client
connecting
connected, sending request
waiting for response
got: 41200
nc -w3 localhost 41202 | dar -B config/test.dcf -A - -o toslave -c - | nc -w3 localhost 41200
waiting
FATAL error, aborting operation
Corrupted data read on pipe
nc<fifo returned: 1
nc-dar-nc returned: 1

The client also hangs, and I have to kill it with a keyboard interrupt.

Community
  • 1
  • 1
Aryeh Leib Taurog
  • 5,370
  • 1
  • 42
  • 49
  • Why does python need to be involved? `dar - | nc` and `nc -l | dar` seems a lot simpler. – Gringo Suave Oct 27 '11 at 22:44
  • @gringo, it _seems_ simpler, but I don't want to have to invoke all my backups manually. Cron isn't smart enough. Coding a backup server in bash would be pretty painful. So what exactly are you suggesting? – Aryeh Leib Taurog Oct 27 '11 at 23:25
  • What's wrong with cron? I don't see the point of using both nc and python for sockets, why not one or the other? – Gringo Suave Oct 28 '11 at 02:56

2 Answers2

1

I'd cut my losses and start over. This solution attempt is very complicated and kludgy. There are many ready-made solutions in the area:

Fwbackups sounds good if you want to take the easy route, rsync+ssh for the hard core.

Gringo Suave
  • 29,931
  • 6
  • 88
  • 75
  • +1 for pragmatic approach. I unfortunately can't sleep when a technical challenge like this is puzzling me. I considered bacula and amanda, but I figured it would take me almost as long to understand them as it would to roll my own. The problem with rsync, which was also tempting, is it doesn't Do the Right Thing nearly as often as dar, plus with dar I can encrypt my backup and stick them easily on amazon s3. – Aryeh Leib Taurog Oct 28 '11 at 12:25
1
  1. Use Popen.communicate() instead of Popen.wait().

    The python documentation for wait() states:

    Warning: This will deadlock if the child process generates enough output to a stdout or stderr pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.

  2. Dar and its related executables should get a -Q if they aren't running interactively.

  3. When syncronizing multiple processes, make sure to call communicate() on the 'weakest link' first: dar_slave before dar_xform and dar before cat. This was done correctly in the question, but it's worth noting.

  4. Clean up shared resources. The client process is holding open a socket from which dar_xform is still reading. Attempting to send/recv data on the initial socket after dar and friends are finished without closing that socket will therefore cause a deadlock.

Here is a working example which doesn't use shell=True or netcat. An advantage of this is I can have the secondary ports assigned dynamically and therefore could conceivably serve multiple backup clients simultaneously.

Aryeh Leib Taurog
  • 5,370
  • 1
  • 42
  • 49