Mainline DHT: why hash in ping is different than hash in find_node?

Question

I am working with Mainline DHT implementation. And I saw strange behaviour.

Let’s say I know node IP and port: 1.1.1.1:7777. I send "find_node" request to him with my own node hash as a target. I get 8 nodes from him, let’s say the first one hash is: abcdeabcdeabcdeabcde and IP: 2.2.2.2:8888. Now I send "ping" request to 2.2.2.2:8888 and that node responses me with completely different hash than I got from 1.1.1.1:7777 in "find_node" response. And I see that is not individual case. What’s going on? Why hashes of the same node from 2 different sources are different? Thanks for answer.

score 0 · Answer 1 · answered Mar 13 '20 at 09:08

0

It can be, that the 2.2.2.2:8888 does not know its external port / address or it didn’t update it yet. Thus different hashes..

answered Mar 13 '20 at 09:08

Adam

158
3
8

Please, give a bit more details. I don't understand, how the fact that 2.2.2.2:8888 does not know its external port / address correlates with the fact that 1.1.1.1:7777 node stores his false hash? I understand that nodes could change their hash but by protocol you should ping your known nodes at least once in 15 minutes, so you will get correct new hash. – Latk12 Mar 13 '20 at 09:22
It is because the hash is made of IP address and port. A node can update its own id when it gets information about its own external address. Until it knows the external one, the id is computed from local IP address (e.g. 192.168.0.1) and local port (e.g. 6881). But when another node (in your case 1.1.1.1) sees this one (2.2.2.2), it computes its id from the ip and port it sees so the hash is then different. Please correct me anyone if I am wrong... – Adam Mar 13 '20 at 10:43
http://bittorrent.org/beps/bep_0005.html#id2 "Use SHA1 and plenty of entropy to ensure a unique ID." If I understand correctly, hash doesn't depend on your IP and port. – Latk12 Mar 13 '20 at 11:55
I am sorry, but I have to say that - those docs sux big time. They’re misleading and incomplete - they caused mi a lot of headaches.. On the other hand, the same as I hate them I love them - it is still better than nothing. I’d recommend you to use them as a mile high overview and for real implementation check out the source code - https://github.com/arvidn/libtorrent/blob/RC_1_2/src/kademlia/node_id.cpp – Adam Mar 13 '20 at 12:57

the8472 · Accepted Answer · 2020-03-14T19:20:47.917

This may be a malicious node that does not keep its node ID consistent in an effort to get into as many routing tables as possible. It might be doing that for data harvesting or DoS amplification purposes.

Generally you shouldn't put too much trust in anything that remote nodes report and sanitize the data. In the case of it not keeping its ID consistent you should remove it from your routing table and disregard results returned in its queries. I have listed a bunch of possible sanitizing approaches beyond BEP42 in the documentation of my own DHT implementation.

Another possibility is that the node B simply changed its ID in the meantime (e.g. due to a restart) and node A either has not updated it yet or does not properly keep track of ID changes. But this shouldn't be happening too frequently.

And I see that is not individual case.

In total I would only expect this behavior from a tiny total fraction of the network. So you should compare the number of unique IP addresses sending bogus responses to the number of unique IPs sending sane ones. It's easy to get these kinds of statistics wrong if your implementation is naive and gets trapped by malicious nodes to contact even more malicious nodes.

But during a lookup you may see this more frequently during the terminal phase when you get polluted data from nodes that do not sanitize their routing table properly. As one example old libtorrent versions did not (see related issue; note that I'm not singling out libtorrent here, many implementations are crappy in this area).

Mainline DHT: why hash in ping is different than hash in find_node?

2 Answers2