0

Kademlia uses XOR metric. Among other things, this has so called "unidirectional" property (= for any given point x and distance e>0, there is exactly one point y such that d(x,y)=e).

First question is a general question: Is this property of the metric critical for the functionality of Kademlia, or is it just the thing that helps with revealing pressure from certain nodes (as the original paper suggests). In other words, if we want to change the metric, how important is to come with a metric that is "unidirectional" as well?

Second question is about concrete change of the metric: Let's assume we have node identifiers (addresses) as X-bit numbers, would any of the following metric work with Kademlia?

  1. d(x,y) = abs(x-y)
  2. d(x,y) = abs(x-y) + 1/(x xor y)

The first metric simply provides difference between numbers, so for node ID 100 the nodes with IDs 90 and 110 are equally distant, so this is not unidirectional metric. In the second case we fix that adding 1/(x xor y), where we know that (x xor y) is unidirectional, so having 1/(x xor y) should preserve this property.

Thus for node ID 100, the node ID 90 is d(100,90) = 10 + 1/62, while the distance from node ID 110 is d(100,110) = 10 + 1/10.

Wapac
  • 4,058
  • 2
  • 20
  • 33

1 Answers1

1

You wouldn't be dealing with kademlia anymore. There are man other routing algorithms which use different distance metrics, some even non-uniform distance metrics, but they do not rely on kademlia-specific assumptions and sometimes incorporate other features to compensate for some undesirable aspect of those metrics.

Since there can be ties in the metric (two candidates for each point), lookups could no longer converge on a precise set of closest nodes.

Bucket splitting and other routing table maintenance algorithms would need to be changed since they assume that identical distances can only occur with node identity.

I'm not sure whether it would affect Big-O properties or other guarantees of kademlia.

Anyway, this seems like an X-Y problem. You want to modify the metric to serve a particular goal. Maybe you should look for routing overlays designed with that goal in mind instead.

d(x,y) = abs(x-y) + 1/(x xor y)

This seems impractical, division on integers suffers from rounding. and in reality you would not be dealing with such small numbers but much larger (e.g. 160bit) numbers, making divisions more expensive too.

the8472
  • 40,999
  • 5
  • 70
  • 122
  • `d(x,y) = abs(x-y) + 1/(x xor y)` was meant on the theoretical level, in real code, one would rather use implementation without the division - for example, you could imagine IDs being 320bit numbers with the least significant half empty (zeros) and then the distance function would produce 320bit number, which the higher 160bits would be `x-y` of the higher 160bits of the IDs and the lower 160bits as `x xor y`. this should preserve all attributes of the original xor metric, right? – Wapac Oct 25 '16 at 16:10
  • I would also be very interested to hear about those other algs as I only know Kademlia, Chord, and Pastry, from which Kademlia seems most appropriate to my use case with the only exception of the metric function – Wapac Oct 25 '16 at 16:12
  • how about actually describing your use-case? anyway, some useful google keywords "p2p overlay network", "routing" and "distance metric" (applicable in various combinations). E.g. one paper i've read used the Levenshtein distance, but had to dynamically adjust node positions due to its awful clustering behavior. CAN would be an example that uses a higher-dimensional metric. – the8472 Oct 25 '16 at 17:19
  • The use case is that we would like to have a network and be able to control the topology in a way that some nodes are topologically closer to others, which has the benefit that somewhat local queries are processed faster (using less routing hops). Kademlia seems to be perfect for this as it does seem to have this local optimization (worst case is log N, but if the node IDs are close to each other, the routing takes just one or two steps even if log N is e.g. 10). Hence the idea of using different metric with XOR distinguisher. – Wapac Oct 26 '16 at 07:11
  • There are overlays that provide locality-clustering by having multiple routing table tiers based on latency. That said, kademlia can already provide fairly low latencies simply because lookups are performed in parallel and can greedily exploit responses with the lowest latencies, even if the taken path has a few more hops but lower latency. tiered routing tables might not actually be that much faster, they just be a bit more efficient than greedy lookups. Additionally an implementation can stream results, so you can already act on preliminary results during a lookup. So no new metric needed. – the8472 Oct 26 '16 at 08:04
  • Thanks for your insight, I think I do understand what you are saying, but the topology control requirement in our case goes beyond just the latency advantage. So, if I understand you correctly, you are saying that if the latency was the only reason, then no new metric is needed. But if I really require the level of topology control that I have with the different metric, I think it will work as expected. That being said, I still should challenge our assumptions on topology control requirement and consider if Kademlia with the default metric won't be good enough. In any case, thank you! – Wapac Oct 26 '16 at 08:48
  • I recommend asking another question with your actual requirements. And for modifying kademlia, I would recommend running some simulations and comparing it to vanilla kademlia. My intuition is that it could kinda work but there might be some suboptimal edge-cases. – the8472 Oct 26 '16 at 09:23