2

Nodes: Clients on DHT-network.
Peers: Clients trying to download a specific resource.

Suppose that the DHT-network is a connected graph, but NO nodes can access ALL other nodes (a consumption contrary to the common belief that the Internet, which DHT-network overlays on, is fully connected).

Is the Peer-network, which overlays on DHT-network, is still a connected graph? Why?

the8472
  • 40,999
  • 5
  • 70
  • 122
Schezuk
  • 217
  • 2
  • 10
  • are you asking about the visibility of the DHT's key-value storage or about a separate networks that bootstrap from a DHT based on contact information stored in the DHT? – the8472 Mar 03 '16 at 21:05
  • I guess it might be the second one. I am, indeed, asking if the connections of Bittorrent clients downloading the same resource always form a connected graph(so that message broadcasting is feasible) rather than isolated groups(not feasible), when all clients are on the same network, but connectivity between all clients is not guaranteed due to ISP's firewall or regional blockage. – Schezuk Mar 04 '16 at 01:26
  • For an imagined example of 7-node-DHT-network, if four Frenches and two Germans cannot access each other, but both can access a Dutch who, however, doesn't provide a proxy. Does Kademlia guarantee that all Frenches are on a connected graph of connections rather than two 2-peer networks? Does Kademlia guarantee that the separated French network and the German network joins instantly once the Dutch starts downloading the same resource the Frenches and Germans do? – Schezuk Mar 04 '16 at 01:40

1 Answers1

7

Kademlia is an abstract algorithm that assumes spherical cows in a vacuum. The only failure modes the paper discusses are churn and temporary graph partitions. Asymmetric reachability is not considered.

Kademlia as implemented in the real world makes no guarantees. Everything is done on a best-effort probabilities-are-good-enough basis.

The main concern in the real world are not nodes where interconnected cluster A cannot talk to a interconnected cluster B. NATs and firewalls do not introduce such clusters on a considerable scale. They create a set of second-class citizens which are not consistently reachable by anyone - absent NAT traversal measures - and thus can only connect to the first-class citizens which are the nodes where anyone can talk to anyone else. Of course a few edge cases exist, but they're largely irrelevant.

Anyway, since you're not even asking about kademlia but about bittorrent, which is not really an overlay over kademlia but a separate network which simply bootstraps its contact information from kademlia things get even more complicated. Bittorrent can be implemented over two different transport mechanisms, TCP and µTP, and clients may support different levels of nat traversal capabilities for TCP, µTP and Kademlia-via-UDP.

Kademlia nodes generally store contact information for bittorrent on several reachable nodes, since they - quite obviously - cannot reach unreachable nodes for the purpose of storage. They also do so with redundancy, which ensures a high likelihood that the stored contact information can be observed by anyone else.

Based on that contact information bittorrent clients can then attempt to connect to each other. As long as there are some reachable bittorrent clients they will be able to establish a direct connection and then may additionally be able to attempt some nat traversal measures between non-reachable nodes. Again, there are no guarantees, so small swarms may fail under some circumstances, but once a swarm becomes large enough the probabilities tip overwhelmingly in the favor of the graph becoming connected.

A small additional concern is IPv4 vs. IPv6. Generally IPv6 provides better connectivity (if firewalls don't get in the way) but not all clients implement the ipv6 extensions equally well, thus possibly preventing a few v6-edges from forming when they would in principle provide superior connectivity between the same nodes.

Note that ipv4 and ipv6 DHTs are in theory independent DHT networks, they just happen to have some significant overlap. It's basically outside the scope of kademlia how to coordinate multiple independent networks.

the8472
  • 40,999
  • 5
  • 70
  • 122
  • Sorry for my late response. Though no guarantee was made by Kademlia, does it imply that a connected graph is **highly probable** in both *Probability* and *Statistics* to found between these peers downloading same resource? – Schezuk Mar 12 '16 at 02:41
  • In the paper of Kademlia, it is found that *the node closest* (does it require nodes in same address space as resources do?) to the resource is to restore the routing information of the resource's peers. Is it the mechanism to merge separated graphs spontaneously formed? And is there any other mechanism? – Schezuk Mar 12 '16 at 02:50
  • BTW, can Kademlia of Bittorent or Emule help find peers using specific BEPs or extensions in its queries, rather than clients asking each other face-to-face and filtering locally? Or is there any other network using DHT that can? – Schezuk Mar 12 '16 at 02:56
  • That's a lot of followup questions, I don't think they're suitable to the comment format. – the8472 Mar 12 '16 at 08:54
  • Yeah...thank you for reminding me. Should I make 3 more questions? – Schezuk Mar 13 '16 at 10:21