6

I have a number of nodes that will use a secondary service to be informed about the address of each other. I want to be able to publish information so as all the other nodes can hear it. Using an XPUB socket is not an option I would want to go with here as I want this system to be distributed.

What I have tried is something that sums up to:

1 Create a PUB socket,

def pub_stream(self):
     self.pub = self.context.socket(zmq.PUB)
     self.pub.bind(self.endpoint)

2 Create a SUB stream,

def sub_stream(self):
    ioloop = IOLoop.instance()
    socket = self.context.socket(zmq.SUB)
    self.sub_stream = ZMQStream(socket, ioloop)
    self.sub_stream.on_recv(self.on_message)
    self.subs_stream.setsockopt(zmq.SUBSCRIBE, self.topic)

3 At some point receive the addresses of all other nodes and connect to them,

# close and restart sub_stream to get rid of any previous connections
for endpoint in endpoints:
    self.sub_stream.connect(endpoint)

No messages are passed on the on_message callback though. Is what I am doing correct, if not, what is a better way of doing what I want to achieve?

dearn44
  • 3,198
  • 4
  • 30
  • 63

1 Answers1

0

Whatever route you decide you need at least one fixed address to connect to unless you have access to multicast etc.

I would have a simple X(pub/sub) proxy as my separate discovery network allowing new nodes to decide which other nodes are of interest based on subject.

Simple version

  • Create a single XSUB/XPUB proxy with a fixed address on each side (DNS etc)
  • Node starts
    • connects to XSUB port of proxy and broadcasts its subject, address and data port
    • connects to XPUB port and subscribes to nodes subjects of interest
      • With the connection information returned it connects its data socket to the data socket of the node based on the connection information.

Reliable version

  • Add multiple discovery proxies with load balancers/virtual IP's to cover fault tolerance etc.
  • Node should send discovery message on a timer
    • Allow late joining nodes to connect
    • Allow connected nodes to discover failed nodes (other than relying on tcp timeout)

Hybrid version

I reality/production I use zeromq along with whatever other (more suitable) services are available. That way I am not trying to reinvent the wheel (for discovery) and just zeromq for the subscription/data work.

For example if I am distributing my systems over the many regions using a cloud provider why not use the discovery services they provide.

AWS (as one hybrid example)

For AWS I use the following components to allow discovery/reliability/failover

  • Route53 (DNS) with health checks
  • Cloud Map (DNS ish)
  • Elastic load balancers (front end to any custom discovery services)
James Harvey
  • 912
  • 4
  • 6
  • Multicasting is viable if I only needed to operate on my local network or on already known endpoints, which is not the case here. Your simple version is not acceptable for my case but your other suggestion is in fact interesting and is what I have at the moment (being influenced by Skypes' architecture). I am looking for inspiration for a different architecture though. By this I mean a better way to manage my connections to nodes on a completely distributed system taking into account that nodes join and leave at will. – dearn44 Mar 18 '19 at 16:57
  • Those examples were probably over simplified, for real systems I try to use whatever built in resources are available so always end up with a hybrid system. I have updated my comment – James Harvey Mar 19 '19 at 10:38