How to secure a Python distributed computing layer

Question

These modules are designed to facilitate a layer of computational capacity across multiple computers. What one or more proven methods are available which will secure against spoofed packets? How can I best make a deep copy of any not included objects referenced by the transmitted callable? Is a function object the best method of encapsulating client jobs? Lastly: can this code be improved? Post Script: Please excuse my last question. I need to redeem my reputation.

sock.py

from socket import socket
from socket import AF_INET
from socket import SOCK_STREAM
from socket import gethostbyname
from socket import gethostname

class SocketServer:
  def __init__(self, port):
    self.sock = socket(AF_INET, SOCK_STREAM)
    self.port = port
  def send(self, tdata):
    self.sock.bind(("127.0.0.1", self.port))
    self.sock.listen(len(tdata))
    while tdata:
      s = self.sock.accept()[0]
      for x in tdata.pop(): s.send(x)
      s.close()
    self.sock.close()

class Socket:
  def __init__(self, host, port):
    self.sock = socket(AF_INET, SOCK_STREAM)
    self.sock.connect((host, port))
  def recv(self, size):
    return self.sock.recv(size)
  def close(self):
    self.sock.close()

pack.py

#http://stackoverflow.com/questions/6234586/we-need-to-pickle-any-sort-of-callable
from marshal import dumps as marshal_dumps
from pickle import dumps as pickle_dumps
from struct import pack as struct_pack
from hashlib import sha224

class packer:
  def __init__(self):
    self.f = []
  def pack(self, what):
    if type(what) is type(lambda:None):
      self.f = []
      self.f.append(marshal_dumps(what.func_code))
      self.f.append(pickle_dumps(what.func_name))
      self.f.append(pickle_dumps(what.func_defaults))
      self.f.append(pickle_dumps(what.func_closure))
      self.f = pickle_dumps(self.f)
      return (struct_pack('Q', len(self.f)), self.f)
    return None
  def gethash(self):
    hash = sha224(self.f).hexdigest()
    return (struct_pack('Q', len(hash)), hash)
  def getwithhash(self, what):
    a, b = self.pack(what)
    c, d = self.gethash()
    return (a, b, c, d)

unpack.py

from types import FunctionType
from pickle import loads as pickle_loads
from marshal import loads as marshal_loads
from struct import unpack as struct_unpack
from struct import calcsize
from hashlib import sha224

#http://stackoverflow.com/questions/6234586/we-need-to-pickle-any-sort-of-callable

class unpacker:
  def __init__(self):
    self.f = []
    self.fcompiled = lambda:None
    self.sizeofsize = calcsize('Q')
  def unpack(self, sock):
    size = struct_unpack('Q', sock.recv(self.sizeofsize))[0]
    self.f = sock.recv(size)
    size = struct_unpack('Q', sock.recv(self.sizeofsize))[0]
    hash0 = sock.recv(size)
    sock.close()
    hash1 = sha224(self.f).hexdigest()
    if hash0 != hash1: return None
    self.f = pickle_loads(self.f)
    a = marshal_loads(self.f[0])
    b = globals() # TODO
    c = pickle_loads(self.f[1])
    d = pickle_loads(self.f[2])
    e = pickle_loads(self.f[3])
    self.fcompiled = FunctionType(a, b, c, d, e)
    return self.fcompiled

test.py

from unpack import unpacker
from pack import packer
from sock import SocketServer
from sock import Socket
from threading import Thread
from time import sleep

count = 2
port = 4446

def f():
  print 42

def server():
  ss = SocketServer(port)
  pack = packer()
  functions = [pack.getwithhash(f) for nothing in range(count)]
  ss.send(functions)

if __name__ == "__main__":
  Thread(target=server).start()
  sleep(1)
  unpack = unpacker()
  for nothing in range(count):
    print unpack.unpack(Socket("127.0.0.1", port))

output:

<function f at 0x0000000>
<function f at 0x0000000>

Is there any particular reason why you're not using [TLS](http://en.wikipedia.org/wiki/Transport_Layer_Security) with certificates on both sides to guarantee transport integrity? — sarnold, Jun 05 '11 at 23:19
No, (being currently unfamiliar with security) part of the purposes for posing these questions was to ascertain exactly that information. +1 for helpful comment. — motoku, Jun 05 '11 at 23:27
Your question is a little overly broad. Can you target it down any? — Greg, Jun 05 '11 at 23:43
@Greg, what particular question is overly broad (or am I allowed only one question per thread?)? — motoku, Jun 05 '11 at 23:47
Should I post the other questions in separate threads? (edit) I'll edit the title instead. — motoku, Jun 06 '11 at 01:17
@Sean: Good title. Very clear. The part I was referring to was the general 'can this code be improved' question. Not a bad thing to wonder at all, but very broad. — Greg, Jun 06 '11 at 04:45
@sean - you've totally changed the question title and now the question body and answer doesn't match. You should ask a new question about your deep copy thing and revert to the original title about securing the connection. Thanks. — Kev, Jun 09 '11 at 20:13

score 6 · Accepted Answer · answered Jun 06 '11 at 00:02

I've taken a closer look at your code now, and I've got some comments:

This code looks like it can easily protect against accidental modification of the pickled objects while they are in flight. sha224 is an excellent hashing algorithm, and will easily notice packets that have been accidentally modified that might still pass the TCP checksum.
This code does not protect against malicious modification of the pickled objects while they are in flight. There is no assurance that packets came from a trusted member of a computing network, nor are there any assurances that packets have not been modified. (Or dropped completely.)

The use of a hashing algorithm alone cannot prove the source of packets, nor prove that they haven't been maliciously modified: an attacker could simply re-compute the hash after modifying the data and re-send the packet.

There are several 'usual approaches' to this problem: you can use a shared secret, a key shared between all the clients participating in the network. This key will be used as part of a keyed-hash, such as HMAC, and data recipients will re-compute the HMAC authentication code using the shared key. It's fast and simple (and legal in some jurisdictions that forbid cryptographic software) but the shared key is a giant liability if any one system has its key compromised. (Compromised systems might not even be part of your threat model.)

You can also use per-host-paid shared secrets. It works just like the shared secret between all nodes, but in the case a single client key has been compromised, only that one client's key need be replaced on all other systems.

You can also use public-key cryptography to provide signatures on the packets. Each client has a private key and a corresponding public key that is known to all the clients. A compromised private key still breaks the system, but it drastically reduces the number of keys you need to prepare. (Only one per client, rather than one for every pair of clients: O(N) vs O(N²).)

Public key systems are fun to write yourself as a learning experience but horrible to try to program correctly. Protecting against replay attacks, selective message dropping, message slicing/constructing, etc., requires a lot of clever protocol design.

So most people deploy a pre-made transport security scheme such as SSLv3 or TLS. Combined with client certificates, it can easily provide assurances that both end-points are who they say they are (up to the point of compromised keys, of course) and provides that data sent in a TLS-protected stream is delivered in the correct order and without tampering.

TLS can be a lot of work to configure properly. You might have just as good success with a simpler tool, such as ssh. Libraries are available so you can control the connections programmatically rather than rely on system-supplied ssh(1) clients and sshd(8) servers.

@Sean, that's a good start; if you don't have any wires leaving the room, it might be all you need. :) — sarnold, Jun 06 '11 at 00:14

How to secure a Python distributed computing layer

1 Answers1