I grabbed the basic idea about DHT from wiki:
Store Data:
In a DHT-network, every node is responsible for a specific range of key-space
. To store a file in the DHT, first, hash the file's name to get the file's key
; second, send a message put(key, file-content) to any node of the DHT
, the message will be relayed to the node which is responsible for key
and that node will store the pair (key, file-content)
.
Get Data:
When getting a file from DHT, first, hash the file's name to get the key
; second send a message get(key)
to any node, relay the message until...
Questions:
- To store a file, we can hash the file's name to get its
key
, but wiki says:
In the real world the key k could be a hash of a file's content rather than a hash of a file's name to provide content-addressable storage, so that renaming of the file does not prevent users from finding it.
Hash file's content? How am I supposed to know the file's content? If I've already know the file's content, then WHY would I search it in the DHT?
According to the wiki, every participating node will spare some space to store files. So does it mean that, if I participate in a DHT, I have to
spare 10G disk space
to store those files whosekey falls into the specific key-space
I'm responsible for?If indeed I should spare some disk space to store those files, then how should I store those
(key, file-content)
on the disk? I mean, should the file be arranged into aB-tree
or something on my disk?When a query happens, how does my computer respond? I assume, first, check the
queried key
, if in mykey-space
, then find thecorresponding file
on my disk. right?