2

I am building a project in which I need a way to elect a leader between a group of processes. When the leader fails, a new leader must be elected. This must support processes which are in different nodes.

After a couple of web searches I did not found a simple solution for this problem. So I was wondering how people in the Erlang community solve this problem ? This seems to be such a fundamental problem, that there must be some battle tested library or way of solving this.

Let me know how you would do this ?

Thank you !

jbernardo
  • 155
  • 1
  • 3
  • 9
  • If you want to experiment, get a Raft library (there are several). If it needs to be rock solid, use some external service like Zookeeper. I think these are pretty much the mainstream options. – cdegroot Feb 23 '17 at 21:39
  • What you're asking for is the Raft algorithm. I'm not aware of any _official_ implementation of it in either Erlang or Elixir. – Onorio Catenacci Feb 24 '17 at 01:06
  • Here's a list of [raft implementations](https://raft.github.io/). [Rafter](https://github.com/andrewjstone/rafter) is written by one of the Basho guys, and Basho's Riak uses a Raft algorithm, so that might be optimal. I'm just guessing, and used ever used a Raft algorithm myself. – popedotninja Feb 24 '17 at 04:32
  • Raft, paxos, multi-paxos and flexible paxos are the algorithms. As stated, several implementations exist. Not sure what exact requirements you have, though, so you might want to elaborate on that. (For instance, is split brain a problem for you?). – Marc Lambrichs Feb 24 '17 at 11:16
  • Thank you all ! Split brain is indeed a problem. I guess I will need to implement one of the algorithms myself, because none of the current implementations seem to be ready and tested for production. – jbernardo Feb 24 '17 at 13:23
  • One very important question here ist whether you really need full-fledged leader election, or if your problem maybe can be solved with simpler constructs. This might well be the case, but we can't tell without you providing more details about the specific problem you are trying to solve. Regarding you last comment, I am not sure whether implementing a consensus algorithm from scratch is the way to go - maybe contributing to an existing library will save you some time, you'd have someone reviewing your changes _and_ you'd be giving back to the FLOSS community ;-) – Patrick Oscity Feb 25 '17 at 08:02
  • @PatrickOscity Completely agree, I will consider contributing to an already existing solution. Regarding the details, I am trying to build a distributed cron system, the idea is to have several cron instances an only one of them preforming work, does this help ? – jbernardo Feb 26 '17 at 13:21
  • Ok this definitely helps understanding your problem much better. I am no expert in distributed systems but from what I know, here is what I think: consensus protocol seems like a good place to start, but leader election alone will not suffice to guarantee that a job is executed only once across the cluster. In a netsplit situation e.g. with Raft there can be more than one leader so you would need to replicate to a majority of nodes (in other words reach consensus) on the info which node handles which job. – Patrick Oscity Feb 26 '17 at 23:29
  • For the specific case of a cron-like job, https://github.com/sorentwo/oban#periodic-jobs could work. "Jobs are considered unique for most of each minute, which prevents duplicate jobs with multiple nodes and across node restarts." – Nathan Long Jul 08 '20 at 15:18

0 Answers0