I have an idea to implement a real-time keyword-based torrent search mechanism using the existing BitTorrent DHT, and I would like to know if it is feasible and realistic.
We have a torrent, and we would like to be able to find it from a keyword
using the DHT only.
H
is a hash function with a 20 bytes outputinfohash
is the info_hash of the torrent (20 bytes)sub(hash, i)
returns 2 bytes ofhash
starting at bytei
(for example,sub(0x62616463666568676a696c6b6e6d706f72717473, 2) = 0x6463
)announce_peer(hash, port)
publishes a fake peer associated with a fake info_hashhash
. The IP of the fake peer is irrelevant and we use theport
number to store data (2 bytes).get_peers(hash)
retrieves fake peers associated with fake info_hashhash
. Let's consider that this function returns a list of port number only.a ++ b
means concatenatea
andb
(for example,0x01 ++ 0x0203 = 0x010203
)
Publication
id <- sub(infohash, 0)
announce_peer( H( 0x0000 ++ 0x00 ++ keyword ), id )
announce_peer( H( id ++ 0x01 ++ keyword ), sub(infohash, 2 ))
announce_peer( H( id ++ 0x02 ++ keyword ), sub(infohash, 4 ))
announce_peer( H( id ++ 0x03 ++ keyword ), sub(infohash, 6 ))
announce_peer( H( id ++ 0x04 ++ keyword ), sub(infohash, 8 ))
announce_peer( H( id ++ 0x05 ++ keyword ), sub(infohash, 10))
announce_peer( H( id ++ 0x06 ++ keyword ), sub(infohash, 12))
announce_peer( H( id ++ 0x07 ++ keyword ), sub(infohash, 14))
announce_peer( H( id ++ 0x08 ++ keyword ), sub(infohash, 16))
announce_peer( H( id ++ 0x09 ++ keyword ), sub(infohash, 18))
Search
ids <- get_peers(H( 0x0000 ++ 0x00 ++ keyword ))
foreach (id : ids)
{
part1 <- get_peers(H( id ++ 0x01 ++ keyword ))[0]
part2 <- get_peers(H( id ++ 0x02 ++ keyword ))[0]
part3 <- get_peers(H( id ++ 0x03 ++ keyword ))[0]
part4 <- get_peers(H( id ++ 0x04 ++ keyword ))[0]
part5 <- get_peers(H( id ++ 0x05 ++ keyword ))[0]
part6 <- get_peers(H( id ++ 0x06 ++ keyword ))[0]
part7 <- get_peers(H( id ++ 0x07 ++ keyword ))[0]
part8 <- get_peers(H( id ++ 0x08 ++ keyword ))[0]
part9 <- get_peers(H( id ++ 0x09 ++ keyword ))[0]
result_infohash <- id ++ part1 ++ part2 ++ ... ++ part9
print("search result:" ++ result_infohash)
}
I know there would be collisions with id
(2 bytes only), but with relatively specific keywords it should work...
We could also build more specific keywords by concatenating several words in alphanumeric order. For example, if we have words A
, B
and C
associated with a torrent, we could publish keywords A
, B
, C
, A ++ B
, A ++ C
, B ++ C
and A ++ B ++ C
.
So, is this awful hack feasible :D ? I know that Retroshare is using BitTorrent's DHT.