3

Let's say I've SQL schema like this-

create table master_lock(job_type int primary key, ip varchar);

Now each node tries to insert a value like this-

insert into master_lock values(1,192.168.1.1);
insert into master_lock  values(1,192.168.1.2)

Whichever node is able to successfully insert can now be considered master. We can add a new column timestamp to the sql table to that stale lock can be removed.

With this scheme, I can easily elect a master node. Why do I need Paxos/Raft etc?

user375868
  • 1,288
  • 4
  • 21
  • 45
  • 2
    How do you handle an unclean shutdown of the current master if it does not clean up the table? I think the approach with _locking_ an existing row and not releasing that lock is safer, because if the application dies, the connection dies and the lock is automatically released. –  Feb 02 '16 at 23:16
  • I was thinking of adding a timestamp column so the sql table. When a master acquires a lock, it can update the timestamp every n seconds. If master has unclean shutdown, the timestamp won't be updated and the other nodes can assume master to be dead. But yea, I agree, not releasing lock is a good idea too. – user375868 Feb 03 '16 at 00:30

1 Answers1

1

Who says you need Paxos/Raft/etc? What you describe is essentially a distributed, atomic Compare-And-Swap operation. You could use any number of mechanisms to do so and a SQL database will work just fine. Your idea for adding in an additional timestamp that must be continually refreshed to retain master status is a common pattern in this arena and it's often referred to as "Master Leases".

Depending upon your application and it's intended operating environment, using a designated third party to arbitrate between peers (which is the role the SQL database fills in your example) might be your best option. It introduces a single point of failure but it's super simple and periodic downtime maintenance windows may be tolerable, again depending on the application. The potential advantage of something like Raft or Multi-Paxos is that there is no single point of failure. As long as a quorum of peers are available, the ability to choose and maintain the master peer is available. The up-front implementation is probably an order of magnitude more complex but you remove the single point of failure and gain a measure of overall architectural simplicity by removing the concept of the designated 3rd party.

Ultimately, it depends on what you're trying to do and the level of robustness you need. Is the maintenance burden and potential downtime of a SPOF a deal breaker for your app? If yes, go Raft/Multi-Paxos. If not, honor the KISS principle and go the designated 3rd party route.

Rakis
  • 7,779
  • 24
  • 25