Simulating node failure for testing purposes

Question

I am developing fault tolerance mechanisms for a distributed application in Rust. I need to simulate failure of one node (and eventually more). The kind of failure to simulate is a node crash. I want the application to completely exit with error in a controlled manner. I want to choose which node fails and I when it does (as much as possible).

The different nodes of the application communicate to each other as peer-to-peer. Each node executes two threads and it would be best if both are be terminated.

In my testing environment I have each node running on a thread (and this thread creates a second one) in my laptop, and a network port assigned to each.

A preliminary idea would be to randomly exit a thread given a probability. This idea does not provide me the control I need to only exit one node and in the exact moment of the application I want to test my fault tolerance mechanisms. Also, this would leave the second thread of a node executing (as far as I know).

I am looking for a way to simulate the node crash in a way I can control and reproduce the same crash whenever I need.

Just kill the process? oh *"In my testing environment I have each node running on a thread"* maybe not then, but if you switched to spawning child processes then it'd work — kmdreko, Nov 21 '22 at 21:57
There's not really a way to "kill" a thread like you want without cooperation from the target thread. One way is to use an `AtomicBool` flag that tells the thread to die. — PitaJ, Nov 21 '22 at 22:52
So all "nodes" are running in the same process and this is not really a distributed system? (If it was, I would have cut the connection between nodes at the network level…). — Caesar, Nov 22 '22 at 02:55

Simulating node failure for testing purposes

0 Answers0