searching a given target, how to widen the search?

Question

Given the target id x...19x, lets consider the program runs the recursive query, builds a table, once no more nodes are left to interrogate, it finds out a situation where the closest resulting nodes are less than 8.

How could i widen the search so the table returns at least 8 nodes for any given id ?

Can i simply take the target id (info hash) and search for info_hash+1 / info_hash-1 ?

If so, in programming terms, how to increase / decrease an id given in this form "afe0..." ?

If not, what could be done to reach that number of redundant nodes ?

A response to a *find_node* query, should contain the K=8 closest nodes the respondent knows. So the only possibility it's less, is that there is less than 8 nodes total in the DHT or you only get responses from severely broken implementations. — Encombe, Jul 26 '17 at 11:34
yeah, i agree, but i currently hardened the rules to add nodes into my table when i perform a bep44 get request. — , Jul 26 '17 at 13:45
Ok, dealing with BEP44 is a different question. To see the 8 closest nodes to a queried node, you can set the target to the queried nodes id, however that risks to trigger some protections against eclipse attacks. So I would recommend to set the target to `(queried nodes id) XOR (0x0000000C 0F0F0F0F 0F0F0F0F 0F0F0F0F 0F0F0F0F)` instead. — Encombe, Jul 26 '17 at 16:18
It would seem silly to make protections conditional on precise target IDs sent by queries. Such a "defense" would be like raising a single pike in the desert and hoping your attackers impale themselves on it. — the8472, Jul 26 '17 at 20:09
@the8472 Epic fantasy tale. Only aeon or so off. Blind attacks are quite futile. — Encombe, Jul 27 '17 at 08:08
@mh-cbon Support for BEP42 and BEP44 is strongly correlated. AFAIK both utorrent and libtorrent implemented BEP42+44 in the same release. Over 2/3 of the nodes support BEP42. — Encombe, Jul 27 '17 at 08:09
@Encombe i'm not talking about the attackers. i'm talking about the defense. it's silly to implement because it's so easily defeated. What does it even defend against? — the8472, Jul 27 '17 at 08:42
BEP42 and 44 may be correlated, but that does not provide any new information since you can already determine which nodes support BEP44 simply by checking whether they return a token when you send a `get`. — the8472, Jul 27 '17 at 09:46
@the8472 I find it silly to dismiss something you haven't seen or understood and instead build a strawman and viciously attack. BEP42/44 is about fast pre-selection. Checking BEP42 takes a few µs. A BEP44get can take seconds. Again you seems more interested in arguing than helping. I'm out. @ mh-cbon I'm confident you have some ideas now and will find a solution. Good luck :-) — Encombe, Jul 27 '17 at 10:34
ah i feel kind of bad to read such strong comment, i see no reasons for that in the perspective where we just like this tech and are all happy to see it working IRL. — , Jul 27 '17 at 10:42
@Encombe it is not necessary to do find_node queries. you can always do get queries, even for off-target probes. therefore you can determine support based on token support. that's why i say bep42 support does not provide additional information. " I find it silly to dismiss something you haven't seen or understood and instead build a strawman and viciously attack." then please provide more details. I know many attacks on DHT nodes, none of those that I am aware of can be prevented by filtering target IDs of requests — the8472, Jul 27 '17 at 11:01

the8472 · Answer 1 · 2017-07-26T21:49:28.363

I think the problem is non-trivial due to the XOR distance. What you really want is "ok, you already gave me the 1..N-farthest nodes you know about, now tell me about N+1..M". There is is no DHT query for that.

And this question does not map to any single query you could ask a single node out of your result set.

I ran a little simulation to test the "ask the 8th-furthest node about its own node ID" approach and here's the result

t:1FBD4155 B667C234 90E0B021 FF837239 38FF5A2C
c:1FB6AC2D CA183942 6BE2B523 2BD998F7 0ACB59B2 d:000BED78 7C7FFB76 FB020502 D45AEACE 3234039E
c:1FADC99B D3CAC04B 9468220D D779F063 DB605C52 d:001088CE 65AD027F 0488922C 28FA825A E39F067E
c:1FAFDD9A 74CB2535 5256CF21 A9B0AB3A 7D6752D3 d:00129CCF C2ACE701 C2B67F00 5633D903 459808FF
c:1FA9B8DA 9940F605 272E0B72 8057B89B 259E9D6D d:0014F98F 2F273431 B7CEBB53 7FD4CAA2 1D61C741
c:1FA7B21D D2183104 3BC1CEC5 968FB208 A3B64A34 d:001AF348 647FF330 AB217EE4 690CC031 9B491018
c:1FA132B5 9F044131 A4C2FB18 11727030 D5912386 d:001C73E0 29638305 34224B39 EEF10209 ED6E79AA
c:1FA06A42 09653EDB CB913184 6C1FB8DD 39CD3661 d:001D2B17 BF02FCEF 5B7181A5 939CCAE4 01326C4D
c:1FA30C2F 7FA17089 BA5C85CA CDE555A1 74F6AD19 d:001E4D7A C9C6B2BD 2ABC35EB 32662798 4C09F735
expecting to find next
e:1F9DD0D4 B21795DC 82298E53 E594D647 0353F0C3 d:00209181 047057E8 12C93E72 1A17A47E 3BACAAEF
asking 1FA30C2F 7FA17089 BA5C85CA CDE555A1 74F6AD19 with its own ID
it returned
n:1FA30C2F 7FA17089 BA5C85CA CDE555A1 74F6AD19 d:001E4D7A C9C6B2BD 2ABC35EB 32662798 4C09F735
n:1FA132B5 9F044131 A4C2FB18 11727030 D5912386 d:001C73E0 29638305 34224B39 EEF10209 ED6E79AA
n:1FA06A42 09653EDB CB913184 6C1FB8DD 39CD3661 d:001D2B17 BF02FCEF 5B7181A5 939CCAE4 01326C4D
n:1FA7B21D D2183104 3BC1CEC5 968FB208 A3B64A34 d:001AF348 647FF330 AB217EE4 690CC031 9B491018
n:1FA9B8DA 9940F605 272E0B72 8057B89B 259E9D6D d:0014F98F 2F273431 B7CEBB53 7FD4CAA2 1D61C741
n:1FAFDD9A 74CB2535 5256CF21 A9B0AB3A 7D6752D3 d:00129CCF C2ACE701 C2B67F00 5633D903 459808FF
n:1FADC99B D3CAC04B 9468220D D779F063 DB605C52 d:001088CE 65AD027F 0488922C 28FA825A E39F067E
n:1FB6AC2D CA183942 6BE2B523 2BD998F7 0ACB59B2 d:000BED78 7C7FFB76 FB020502 D45AEACE 3234039E
n:1F8248F8 8CEA3B04 5196FFEE F9B4F6C1 3B3B2707 d:003F09AD 3A8DF930 C1764FCF 063784F8 03C47D2B
n:1F8F0556 D1B0BCBF 42D54567 825058D8 155BA5E4 d:00324403 67D77E8B D235F546 7DD32AE1 2DA4FFC8
n:1F8D9186 86C1AEFE A2C24C73 59F5A2F5 D4C2FA5E d:0030D0D3 30A66CCA 3222FC52 A676D0CC EC3DA072
n:1F8C71EF C8B0A12E 40B5233C 680D2373 A3D730A1 d:003130BA 7ED7631A D055931D 978E514A 9B286A8D
n:1F93C42A BA85A26C 184185B1 A79A6E60 253DBC2D d:002E857F 0CE26058 88A13590 58191C59 1DC2E601
n:1F96B634 A043FC17 616A549F F521E9F9 4F5600FD d:002BF761 16243E23 F18AE4BE 0AA29BC0 77A95AD1
n:1F953B74 8977DCE8 8636338B A2EC4ED2 14A83E35 d:00287A21 3F101EDC 16D683AA 5D6F3CEB 2C576419
n:1F9A39EF 59885496 ED794C02 49545D6C 92565959 d:002778BA EFEF96A2 7D99FC23 B6D72F55 AAA90375

65 failures in 1000 runs

Note that this is when the queried node returns 16 contacts. If it only returns 8 the failure rate goes to ~20%. And this failure rate is not an uncorrelated event, it's collective behavior due to the way that shared prefixes work, in other words just querying other nodes of the 8-closest set might not significantly improve chances.

It is fairly obvious that the problem is that the target's own ID may be positioned so that it happens to consider all those nodes that you already visited closer to its own id than the next-closest that we don't know about yet.

The correct solution is to build a temporary routing table organized so that its home bucket covers the target key that you are interested in and then incrementally populating the home bucket and its neighbors (and splitting them if needed) until you have what you have enough BEP-44 compliant contacts.

This is a fairly involved approach. Normally backtracking and asking contacts you have not asked yet about the target ID should provide you more than 8 closest nodes because many implementations simply hand out more than 8 contacts.

the way i saw the problem was more that i wanted to scan around, to exhaust more nodes, to build a stronger view of the table. Because now, for multiple identical sequential queries (did not try in //) the distance(target,closest[0]) is very volatile. reducing it to a threshold could me give me more stable results, maybe. but to exhaust more nodes, i need to ask a different id, otherwise using same bootstrap (seeder) i should receive same results (+/-few nodes). So i need to ask it to go in a different direction, right ? — , Jul 26 '17 at 22:03
never mind, i will simply get the closest, test write it, then if i had not enough, i will apply a shift on the target to target a new set of nodes, build its table, test write closest new nodes, repeat that until i have correctly wrote nth redundant nodes. On read i should proceed the same way. — , Jul 26 '17 at 22:21
This answer seems incoherent to me and the suggested 'solution' overy complex and expensive. The triangle property makes far simpler solutions viable. — Encombe, Jul 27 '17 at 08:10
@mh-cbon bootstrap nodes should play no role in this. during any lookup you should have a set of contacts that were returned but you never visited. if you want to broaden a search you can ask those too. — the8472, Jul 27 '17 at 08:44
@Encombe I think the arithmetic involved in such an approach would be equivalent to a bucketized search. you would end up searching multiple prefixes because each query only involves 2 points: the target ID and the node's own ID (determining which neighbors it knows about). Then it will sort them by distance, in other words you only directly influence the 1st result, the tail of those lists are also determined by the node you're querying. that's why the RPCs lack power to do this with a single query. — the8472, Jul 27 '17 at 08:53
Buckets are really just a partition of the search space into ranges. E.g. `0x0CFA0 - 0x0CFBF` would be the bucket prefix `0x0CFA/15`. So a bucketized search can also be formulated as an interval search. — the8472, Jul 27 '17 at 08:59

searching a given target, how to widen the search?

1 Answers1