I'm trying to implement a cluster using Erlang as the glue that holds it all together. I like the idea that it creates a fully connected graph of nodes, but upon reading different articles online, it seems as though this doesn't scale well (having a max of 50 - 100 nodes). Did the developers of OTP impose this limitation on purpose? I do know that you can setup nodes to have explicit connections only as well as have hidden nodes, etc. But, it seems as though the default out-of-the-box setup isn't very scalable.
So to the questions:
If you had 5 nodes (A, B, C, D, E) that all had explicit connections such that A-B-C-D-E. Does Erlang/OTP allow A to talk directly to E or does A have to pass messages from B through D to get to E, and thus that's the reason for the fully connected graph? Again, it makes sense but it doesn't scale well from what I've seen.
If one was to try and go for a scalable and fault-tolerant system, what are your options? It seems as though, if you can't create a fully connected graph because you have too many nodes, the next best thing would be to create a tree of some kind. But, this doesn't seem very fault-tolerant because if the root or any parent of children nodes dies, you would lose a significant portion of your cluster.
In looking into supervisors and workers, all of the examples I've seen apply this to processes on a single node. Could it be applied to a cluster of nodes to help implement fault-tolerance?
Can nodes be part of several clusters?
Thanks for your help, if there is a semi-recent website or blogpost (roughly 1-year old) that I've missed, I'd be happy to look at those. But, I've scoured the internet pretty well.