We have a cluster of machines (around 50 and growing). Each machine has a search index that needs to be updated multiple times per day. We currently update the index on each machine individually, but ideally, we could update it on one machine, then sync the new files to the rest of the cluster. We initially used rsync to handle this, but as the number of machines grew, it became apparent that this solution can't scale. I have just started researching multicast file transfers. Anyone with some experience here that can suggest some places to look?
-
1http://uftp-multicast.sourceforge.net looks promising but I have no experience with it. – Grant May 14 '14 at 20:40
-
What option did you go with? – ewwhite Nov 17 '14 at 01:39
5 Answers
This was an interview question once for me...
Multicast options:
BitTorrent
Other protocols using pub/sub messaging.
Another approach... Use a distribution tree:
Send to N hosts, who will in turn each send to N hosts; and work down the tree that way. That would of course require some development work on your side, but it's possibly the more scalable approach.
Most of this depends on how many systems you'll actually need to cater to, the index size and your networking infrastructure.

- 197,159
- 92
- 443
- 809
-
1This was an operational problem for me. I implemented a distribution tree for an enterprise NIS space that had over 200 slave servers being pushed from one master. The hourly update was taking over two hours... I added a servers NIS map (key=server name, value = update_responsible_server). With a fanout of about 8, we had update time under two minutes. By making small mods to the existing NIS distribution scripts, design, test, and implementation took less than a day. – mpez0 May 15 '14 at 17:22
-
You might be better served by using a shared filesystem, especially if the search index is readonly by the app using the search index (i.e. at the destination end). That way, much of the complexity is done for you.

- 1,230
- 7
- 12
Try bittorrent. It's designed to spread files over multiple hosts quickly. Multicast is going to make your network engineer scream in pain :)

- 19,277
- 2
- 44
- 70
I'd recommend looking at git. I've used that in the past to make changes on one server and push them out or have cron jobs on the other servers to pull them down. There is quite a bit of flexibility with the solution.

- 1,383
- 3
- 17
- 34