0

I am developing fault tolerance mechanisms for a distributed application in Rust. I need to simulate failure of one node (and eventually more). The kind of failure to simulate is a node crash. I want the application to completely exit with error in a controlled manner. I want to choose which node fails and I when it does (as much as possible).

The different nodes of the application communicate to each other as peer-to-peer. Each node executes two threads and it would be best if both are be terminated.

In my testing environment I have each node running on a thread (and this thread creates a second one) in my laptop, and a network port assigned to each.

A preliminary idea would be to randomly exit a thread given a probability. This idea does not provide me the control I need to only exit one node and in the exact moment of the application I want to test my fault tolerance mechanisms. Also, this would leave the second thread of a node executing (as far as I know).

I am looking for a way to simulate the node crash in a way I can control and reproduce the same crash whenever I need.

javier
  • 113
  • 9
  • Just kill the process? oh *"In my testing environment I have each node running on a thread"* maybe not then, but if you switched to spawning child processes then it'd work – kmdreko Nov 21 '22 at 21:57
  • There's not really a way to "kill" a thread like you want without cooperation from the target thread. One way is to use an `AtomicBool` flag that tells the thread to die. – PitaJ Nov 21 '22 at 22:52
  • So all "nodes" are running in the same process and this is not really a distributed system? (If it was, I would have cut the connection between nodes at the network level…). – Caesar Nov 22 '22 at 02:55

0 Answers0