Dynamic vector clock reconstruction with multiple nodes

Question

I am using a dynamic vector clock for my application with multiple nodes. Each node has a unique ID which is stored alongside its clock in the vector clock. I need to turn the vector clock into a textual representation. My current solution is to build a hash over all ids that are part of the vector clock. This however requires me to search for the matching hash in the product space of all node names.

For example I have 3 nodes with the (simplified) IDs "a", "b" and "c" and the clocks 3, 6 and 4. To not store them as "a:3-b:6-c:4", I join the IDs to "a\nb\nc" and create the hash out of this. In the end I have a string with "hash:3-6-4" to keep the vector clock short, even with a lot of nodes.

This dynamic vector clock should be able to add new nodes, with increasing time. E.g. if we add "d:1" to the vector clock above I take the hash of "a\nb\nc\nd" and join it to "hash:3-6-4-1".

If I now receive this vector clock on any node I want to be able to reconstruct the IDs from the hash in order to work with them locally. The current implementation I have is not applicable for more than 15 nodes at once, as reconstructing the IDs from the hash is too expensive.

Is there any efficient algorithm or datastructure that would allow me to solve this problem more intelligently?

Thanks a lot in advance for your input.

Are the clock vectors typically sparse or dense? That is, does a typical vector include only a very small subset of all possible nodes (say, <1%) or do most vectors typically contain many nodes (say, >10% of all nodes)? And also, is the total node space limited, or can new nodes be dynamically added to the system (and if so, does this happen often)? — Ilmari Karonen, Aug 19 '16 at 08:58
I only use one vector clock, which is shared across the nodes, therefore this vector clock contains all nodes. The node space is not limited and new nodes can be added dynamically to the system. That's the point where I ran into the above stated problem, as with an increasing number of nodes the node recreation from the hash became very expensive. — Matthias Büttner, Aug 22 '16 at 05:47
I have some ideas on how this might be solvable, but I'd like to know a bit more about your constraints first. For example, do you really need strong clock consistency, or would a scheme like plausible clocks (or even just basic Lamport timestamps) do? Also, is it feasible to broadcast a message across the whole network whenever a new node joins (or leaves; I assume that can also happen, or else your vectors would grow without bound over time anyway)? And how much do you trust these nodes -- is it safe to assume that they won't deliberately misbehave or lie to each other? — Ilmari Karonen, Aug 22 '16 at 12:24
... Also, I just re-read your question, and realized that I may have misinterpreted it. I was assuming that your problem was minimizing communications overhead between the nodes, but now, reading your question again, it seems like you're actually concerned with logging the events and analyzing the log afterwards. Could you clarify which issue (or both or neither; I may still be misunderstanding your situation) you're asking about? — Ilmari Karonen, Aug 22 '16 at 12:27
Yes, I do need a strong consistency in the vector clock. Timestamps or plausible clocks won't be sufficient. And I also need every node to be notified about a newly added node. As I rely on the strict monoton rising accumulated clock count there is no way for nodes to leave the clock (I also do know that this will grow over time). There are cases where some nodes won't increase their counter anymore but I cannot remove them from the vector clock. The last constraint is that I have to rely on the node count to 100% at any time. — Matthias Büttner, Aug 22 '16 at 13:04
The actual problem I have is that with my current implementation I combine all node IDs to one single hash, which I receive on every other node. As I need to work with every node ID on all nodes I need to reconstruct the IDs from the hash, which takes a lot of time with an increasing node count. Therefore I am looking for an algorithm or a data structure which allows me to easily reconstruct all IDs from the hash (or something similar) without writing every ID to the vector clock explicitly. I hope this helps clarifying my issue. And thanks for your efforts by the way. — Matthias Büttner, Aug 22 '16 at 13:11

Dynamic vector clock reconstruction with multiple nodes

0 Answers0