I am developing fault tolerance mechanisms for a distributed application in Rust. I need to simulate failure of one node (and eventually more). The kind of failure to simulate is a node crash. I want the application to completely exit with error in a controlled manner. I want to choose which node fails and I when it does (as much as possible).
The different nodes of the application communicate to each other as peer-to-peer. Each node executes two threads and it would be best if both are be terminated.
In my testing environment I have each node running on a thread (and this thread creates a second one) in my laptop, and a network port assigned to each.
A preliminary idea would be to randomly exit a thread given a probability. This idea does not provide me the control I need to only exit one node and in the exact moment of the application I want to test my fault tolerance mechanisms. Also, this would leave the second thread of a node executing (as far as I know).
I am looking for a way to simulate the node crash in a way I can control and reproduce the same crash whenever I need.