0

I have a distributed application where I need to get updated information from other clients, continuously.

The only solution I can think of is to make timestamp based keys, so that updates/puts to the DHT would be of the form:

[long millis, data]

So when a client starts up, they have a last_checked_timestamp, and they scan all the data after that timestamp, and possibly rescan at given intervals.

But this would cause a whole host of problems with fetching, because now I don't know the specific keys I'm fetching for, only a range of keys. I've tried a key-range algorithm for fetching, where the keys are intervals, but it didn't work well, and I need to find a reference implementation for how this should work correctly.

My application needs to fetch updated data, or data within the range of (last_checked_time, current_time)

Thanks in advance.

dessalines
  • 6,352
  • 5
  • 42
  • 59

1 Answers1

1

In principle you can build arbitrary data structures on top of a DHT. Since they provide <key-value> lookups you can use the value part as pointer to other keys. And with pointers you can build lists, doubly linked lists, skip lists, B+Trees or pretty much any other data structure.

The key values you point to can be randomly distributed among the keyspace (to spread load) or clustered (to speed up lookups).

So instead of trying to derive new keys for current data you can just do the opposite: have one fixed key as entry point into the data structure (head/root pointer) and update it accordingly as new data gets added.

Of course such a scheme means that you spread your data over multiple keys, which makes it more brittle and easier to corrupt. That's why I mentioned skip lists which provide some redundancy in the structure even if nodes get lost.

And you will need signatures so that a client can verify that all those mutable data entries belong to the same data structure. In some circumstances signatures can be substituted with simple hashes.

the8472
  • 40,999
  • 5
  • 70
  • 122