9

I've been reading Introduction to Algorithms and started to get a few ideas and questions popping up in my head. The one that's baffled me most is how you would approach designing an algorithm to schedule items/messages in a queue that is distributed.

My thoughts have lead me to browsing Wikipedia on topics such as Sorting,Message queues,Sheduling, Distributed hashtables, to name a few.

The scenario: Say you wanted to have a system that queued messages (strings or some serialized object for example). A key feature of this system is to avoid any single point of failure. The system had to be distributed across multiple nodes within some cluster and had to consistently (or as best as possible) even the work load of each node within the cluster to avoid hotspots.

You want to avoid the use of a master/slave design for replication and scaling (no single point of failure). The system totally avoids writing to disc and maintains in memory data structures.

Since this is meant to be a queue of some sort the system should be able to use varying scheduling algorithms (FIFO,Earliest deadline,round robin etc...) to determine which message should be returned on the next request regardless of which node in the cluster the request is made to.

My initial thoughts I can imagine how this would work on a single machine but when I start thinking about how you'd distribute something like this questions like:

How would I hash each message?

How would I know which node a message was sent to?

How would I schedule each item so that I can determine which message and from which node should be returned next?

I started reading about distributed hash tables and how projects like Apache Cassandra use some sort of consistent hashing to distribute data but then I thought, since the query won't supply a key I need to know where the next item is and just supply it... This lead into reading about peer to peer protocols and how they approach the synchronization problem across nodes.

So my question is, how would you approach a problem like the one described above, or is this too far fetched and is simply a stupid idea...?

Just an overview, pointers,different approaches, pitfalls and benefits of each. The technologies/concepts/design/theory that may be appropriate. Basically anything that could be of use in understanding how something like this may work.

And if you're wondering, no I'm not intending to implement anything like this, its just popped into my head while reading (It happens, I get distracted by wild ideas when I read a good book).

UPDATE

Another interesting point that would become an issue is distributed deletes.I know systems like Cassandra have tackled this by implementing HintedHandoff,Read Repair and AntiEntropy and it seems to work work well but are there any other (viable and efficient) means of tackling this?

zcourts
  • 4,863
  • 6
  • 49
  • 74
  • I'm slightly overwhelmed, but I shall down a coffee and look again soon. Thank You Very Much – Caffeinated Oct 05 '11 at 19:53
  • 1
    How much duplication of work do you expect/are you willing to have? By that I mean that you schedule some work and two or more different computers start computing it. If you don't want any duplication, that would be hard to achieve. If you want to do something like that on a single computer, you have single work queue and even then you (usually) have to use locks. – svick Oct 05 '11 at 20:29
  • I think it'd be ideal if there's no fix limit on the replication. Being familiar with Apache Cassandra it has support for tunable replication factor, the replication idea was actually stemmed from that. But I'm just as interested in knowing the difficulties associated with having something like this over say just sticking a hard limit in and letting a message be replicated just once somewhere... – zcourts Oct 05 '11 at 20:39
  • I think you misunderstood me. The problem I was talking about is if you wanted *no* replication of the work done. I think doing that would be quite difficult on your decentralized setting. – svick Oct 05 '11 at 22:18
  • @svick: there are some reduction possibilities, see my answer below. – DaveFar Oct 05 '11 at 22:25
  • @svick Sorry, I did indeed misunderstand. I'm not sure, I think one approach could use a leader election algorithm as suggested by DaveBall where for example, the node that receives the write request has a better chance of becoming the leader unless it's under high load. The leader then co-ordinates the work on that message and for any other messages it was elected leader of. In doing so the only work other nodes need to do is store a copy of the message, knowing its elected leader. – zcourts Oct 06 '11 at 06:31
  • It may make sense for the leader to also update all replications (or the entire cluster depending on the partition algorithm which decides which nodes get a particular message) letting them know which other nodes have a copy in case the elected leader of a message is down when a read request is received the rest of the cluster also knows the other locations of the message within the cluster – zcourts Oct 06 '11 at 06:32

1 Answers1

4

Overview, as you wanted

There are some popular techniques for distributed algorithms, e.g. using clocks, waves or general purpose routing algorithms.

You can find these in the great distributed algorithm books Introduction to distributed algorithms by Tel and Distributed Algorithms by Lynch.

Reductions

are particularly useful since general distributed algorithms can become quite complex. You might be able to use a reduction to a simpler, more specific case.

If, for instance, you want to avoid having a single point of failure, but a symmetric distributed algorithm is too complex, you can use the standard distributed algorithm of (leader) election and afterwards use a simpler asymmetric algorithm, i.e. one which can make use of a master.

Similarly, you can use synchronizers to transform a synchronous network model to an asynchronous one.

You can use snapshots to be able to analyze offline instead of having to deal with varying online process states.

DaveFar
  • 7,078
  • 4
  • 50
  • 90
  • Introduction to distributed algorithms by Tel looks like an interesting read. I like the idea of using a leader election algorithm, I can think of how it would make replication easier, I'm not so sure about the snapshots, in a case where the system has extremely high read/write rates, wouldn't a snapshot algorithm lead to some items being out of date because it's been pushed out the queue from a read request but replication has taken too long to propagate? Another thing I've just thought of is how would distributed deletes be handled? I'll update my question – zcourts Oct 06 '11 at 06:15