4

I understand that the Kademlia routing table is made up of 160 buckets.

Nodes are put into buckets 0-159, depending on their prefix length (which is the number of leading unset bits in the XOR of the local node key and the node).

Why is this so, is there any performance benefits involved (other than the fact that iterating through 160*20 nodes to find the closest is infeasible)?.

Arvid
  • 10,915
  • 1
  • 32
  • 40
liamzebedee
  • 14,010
  • 21
  • 72
  • 118
  • Yes there are performance benefits. This helps keep lookup time to O(log n). Full explanation at http://gleamly.com/article/introduction-kademlia-dht-how-it-works – Joshua Kissoon Aug 06 '15 at 01:48

3 Answers3

4

Kademlia uses the XOR of 2 node IDs as a measure of their distance apart. The idea with the routing table buckets is that a node has detailed knowledge of the network "close" to it and less knowledge the further you get from its ID.

To find a group of nodes closest to a particular key, a node would first query the closest ones it knows about from its own routing table. On average, half of these lookups will fall into the address space covered by bucket 0. But that bucket is only allowed to contain 20 nodes, so there is little chance that these nodes are the actual closest. However, each node in that bucket will have more detailed knowledge of that part of the address space, and will likely be able to provide from its own routing table a better list of close nodes, which can then also be queried, and so on.

In this way, an iterative lookup very quickly hones in on the actual closest group of nodes.

When you say

iterating through 160*20 nodes to find the closest is infeasible

I take it you mean that actually querying each of them would be infeasible; since iterating through such a list to extract the closest ones is exactly how lookup requests (RPCs) are handled by the nodes.

Note that in real-world scenarios, it would be very unlikely for the number of buckets to get anywhere near 160. For example, for a network of a billion nodes, the average bucket count would be 27.

As to the actual values chosen in the original Kademlia paper, I don't know why the bucket size is specified as 20. However, I imagine that 160 is derived from the bit-size of a SHA1 hash. Normally a hash function is used to generate the key for a given value to be stored. If the risk of a hash-collision using SHA1 is tolerably low, then this is fine. If not, a different hash algorithm could be used, e.g. SHA256 would yield up to 256 buckets.

Fraser
  • 74,704
  • 20
  • 238
  • 215
3

Why is this so, is there any performance benefits involved

This organization combined with the XOR metric creates tiered locality guaranteeing that querying a node somewhat-closer to your target will know even-closer nodes. Which yields exponential convergence.

Maybe you can think of it as a distributed interval search.

I understand that the Kademlia routing table is made up of 160 buckets.

A flat array of (up to) 160 buckets is just a primitive way many implementations use to approximate the correct routing table layout.

With bucket splitting or multihomed routing tables you need an actual tree layout which could contain way more than 160 buckets.

In fact, here's a small fraction of a tree-based routing table of a multihomed DHT node with 89 node IDs, the complete table is larger than that (these basically are the regions for two of the 89 IDs):

0000000...   entries:8 replacements:8
0000001000...   entries:8 replacements:8
0000001001000...   entries:8 replacements:8
00000010010010...   entries:8 replacements:8
00000010010011000...   entries:8 replacements:8
000000100100110010...   entries:8 replacements:8
0000001001001100110...   entries:8 replacements:8
00000010010011001110...   entries:8 replacements:8
0000001001001100111100...   entries:5 replacements:0
0000001001001100111101...   entries:8 replacements:0
000000100100110011111...   entries:8 replacements:0
0000001001001101...   entries:8 replacements:8
000000100100111...   entries:8 replacements:8
000000100101...   entries:8 replacements:8
00000010011...   entries:8 replacements:8
000000101...   entries:8 replacements:8
00000011...   entries:8 replacements:8
0000010...   entries:8 replacements:8
0000011000...   entries:8 replacements:8
0000011001000...   entries:8 replacements:8
00000110010010...   entries:8 replacements:8
00000110010011000...   entries:8 replacements:8
000001100100110010...   entries:8 replacements:8
0000011001001100110...   entries:8 replacements:8
00000110010011001110...   entries:8 replacements:5
0000011001001100111100...   entries:6 replacements:0
0000011001001100111101...   entries:2 replacements:0
000001100100110011111...   entries:8 replacements:0
0000011001001101...   entries:8 replacements:8
000001100100111...   entries:8 replacements:8
000001100101...   entries:8 replacements:8
00000110011...   entries:8 replacements:8
000001101...   entries:8 replacements:8
00000111...   entries:8 replacements:8

Its lookup cache is even larger and consists of 7k buckets.

the8472
  • 40,999
  • 5
  • 70
  • 122
  • What is this multi-homed DHT client? Can you tell any more about it? I wouldn't mind seeing its source. Is it running on the bittorrent DHT? I wrote an MDHT client myself in python – gsk Aug 20 '12 at 06:23
  • It means it's one client running as multiple nodes, e.g. for multiple IPs. Yes, it's the bittorrent dht. Source can be found here: http://azsmrc.svn.sourceforge.net/viewvc/azsmrc/mldht/trunk/ – the8472 Aug 23 '12 at 00:31
0

This algorithm was created for making P2P file sharing services around 2001. To put this in the context, imagine that each P2P node stores and serves mp3 files. The routing table contains hashes of those files.

With the storage hardware of the time it was not possible to store all files on each node. The idea was that each P2P user stores some part of this mp3 database on their PC. The maximum 160*20 = 3200 mp3 files takes about 15 Gb. It felt reasonable.

There has to be a way to distribute the data in a fair way. Logdistance (based on the prefix length) is one of them. The files that have "further" hashes get collisions more often. Their corresponding buckets get full fast, it is more random which files get in there. The files that have "closer" hashes will be the ones you as a peer are responsible to store for longer. It is fewer of these files, they fill the buckets slower.

other than the fact that iterating through 160*20 nodes to find the closest is infeasible

Comparing 3200 distances is not a big deal nowadays, but yes, the buckets help to find the ones that are "closest" to you for replication.

battlmonstr
  • 5,841
  • 1
  • 23
  • 33