2

Okay, so lets say I have a DHT running with 10 clients with a bunch of data in it.

Wouldn't it be relatively easy for a malicious client to run an alternate version of my program, that could do potentially destructive actions to my data(such as replace a key, delete a key, alter data, delete my entire DHT, etc...)

How do I prevent this from happening?

I can only think of:

  • Checksum verifying the program and only allowing those to connect. But could that be hacked?

  • Verifying each DHT client with some kind of key.

Does anyone know how to protect against this? Thanks in advance.

dessalines
  • 6,352
  • 5
  • 42
  • 59

1 Answers1

4

Don't try to verify the software running a DHT node itself, just verify the behavior and data they provide, as needed.

There are several ways to do that, depending on the intended use of the data. Without knowing the exact use-case of your DHT I can only provide some general guidelines:

  • If you have some trust anchor for the data outside the DHT itself (e.g. user A gives user B a link that includes a pubkey) then use that signature, then incorporate that into your protocol design, e.g. derive DHT lookup keys from the pubkey, use it it to verify signatures on data to make forgery impossible, etc.
    • in some circumstances it can be useful to use elliptic curve cryptography and have node ID == node's pubkey
  • use redundancy, publish to multiple nodes. you should do that anyway since nodes can go offline / become unreachable
  • another form of redundancy: asymmetric APIs where each originator publishes a single value but target nodes return lists of values that have been stored on them, possibly incorporating the IP of the originating nodes.
  • reduce incentive for attackers to corrupt the DHT in the first place:
    • collaborative publishing - if multiple participants in the network have an interest publishing the same chunk of data then it'll become harder for an attacker to compete with them
    • easy-to-regenerate data - if someone dDoSes a small portion of the keyspace, just republish the data, it's the publisher's task to keep stuff in the DHT, don't rely on other nodes maintaining it forever
    • indirection / data is just a pointer - if the DHT doesn't contain any "juicy" data itself that an attacker might want to delete/replace but only is a pointer to the actual data and that pointer could easily be replaced with another one it becomes less useful to attack it
    • pollution-resistant data - 1 good entry under 20 bad entries should still be useful)
    • keep complexity low - fragile data structures spanning multiple DHT nodes/indirected lookups are easier to break than a single, few bytes long string stored on dozens of nodes
    • provide a way to verify data later at a higher protocol level - treat everything obtained from the DHT as tentative
  • make it difficult for attackers to dominate anything but a small fraction of the DHT keyspace
    • only 1 entry per IP in routing tables
    • only 1 entry per IP per key in <key,List<value>> tables
    • restrict node IDs by deriving them from the node's external IP and/or using hashcash
    • when using UDP: require 3-way handshake for write operations to avoid IP spoofing

In general: Treat all nodes as unreliable, buggy and some (but not all) of them malicious.
Trust but verify.

the8472
  • 40,999
  • 5
  • 70
  • 122