I have a situation where I have a series of mostly connected nodes that need to sync a pooled dataset. They are files from 200-1500K and update at irregular intervals between 30min to 6hours depending on the environment. Right now the numbers of nodes are in the hundreds, but ideally, that will grow.
Currently, I am using libtorrent right now to keep a series of files in sync between a cluster of nodes. I do a dump every few hours and create a new torrent based on the prior one. I then associate it using the strategy of BEP 38. The infohash is then posted to a known entry in the DHT where the other nodes poll to pick it up.
I am wondering if there is a better way to do this. The reason I like BitTorrent was originally for firmware updates. I do not need to worry about nodes less than awesome connectivity, and with the DHT it can self assemble reasonably well. It was then extended to sync these pooled files.
I am currently trying to see if I can make an extension that would allow me to have each node do an announce_peer
for each new record. Then in theory interested parties would be able to listen for that. That brings up two big issues:
- How do I let the interested nodes know that there is new data?
- If I have a thousand or more nodes adding new infohashes every few minutes what will that do to the DHT?
I will admit it feels like I am trying to drive a square peg into a round hole, but I really would like to keep as few protocols in play at a time.