How does DHT work?

Question

I grabbed the basic idea about DHT from wiki:

Store Data:

In a DHT-network, every node is responsible for a specific range of key-space. To store a file in the DHT, first, hash the file's name to get the file's key; second, send a message put(key, file-content) to any node of the DHT, the message will be relayed to the node which is responsible for key and that node will store the pair (key, file-content).

Get Data:

When getting a file from DHT, first, hash the file's name to get the key; second send a message get(key) to any node, relay the message until...

Questions:

To store a file, we can hash the file's name to get its key, but wiki says:

In the real world the key k could be a hash of a file's content rather than a hash of a file's name to provide content-addressable storage, so that renaming of the file does not prevent users from finding it.

Hash file's content? How am I supposed to know the file's content? If I've already know the file's content, then WHY would I search it in the DHT?

According to the wiki, every participating node will spare some space to store files. So does it mean that, if I participate in a DHT, I have to spare 10G disk space to store those files whose key falls into the specific key-space I'm responsible for?
If indeed I should spare some disk space to store those files, then how should I store those (key, file-content) on the disk? I mean, should the file be arranged into a B-tree or something on my disk?
When a query happens, how does my computer respond? I assume, first, check the queried key, if in my key-space, then find the corresponding file on my disk. right?

Consider splitting up your posts i the future. 1. You're not supposed to know the file's content. Just the hash. A .torrent metafile also contains the hash of each file (each piece, actually) in the torrent. — Dennis, Oct 08 '11 at 02:55
I've spent 4 hours night coming to early morning, but I also can't understand the thing. I suppose that VALUES stored in DHT - are IP:port pairs with double semantical forwarding: first to find computers responding for storage of ADDRESSES of another computers. Second step is locating a file itself. So complex... — Brian Cannard, Jun 25 '14 at 03:24

score 3 · Answer 1 · answered Nov 25 '14 at 10:25

A DHT is just an algorithm. At its base it provides distributed key-value PUT and GET operations. Similar to a normal Map or associative array found in many programming languages.

Due to the real-world limitations such as untrustworthy nodes, failure rates and so on actual DHT implementations don't provide an arbitrary-length PUT(<uint8[]>, <uint8[]>) operation.

Example:

The kademlia implementation for bittorrent for example provides the following interfaces:

PUT(uint8[20], uint16)
GET(uint8[20]) -> List<Pair<IP, uint16>> where the list only represents a randomly sampled subset of the actual data

As you can see it actually is a specialized asymmetric interface when compared to more generic associative arrays. The IP address is always derived from the PUT sender's source address, i.e. cannot be explicitly set. And the GET returns a list instead of a single value, so it implements a MultiMap or Map<List>, if you want to see it like that.

In bittorrent's case a hash is used as content descriptor, where peers which have the content announce themselves on the DHT. If someone wants the file(s) they look up IP/Port pairs on the DHT, then contact the peers via a separate protocol and then download the data.

But other uses for a DHT are also possible, i.e. they could store signed, structured data, tweet-like text snippets or whatever. It always depends on your applications' needs.

It's just a basic building block.

How does DHT work?

1 Answers1

Example: