I am using the fabric to run an experiment in amazon web services ec2 cluster(50 instances). The experiment is mainly about using some clients to perform requests to my servers.
Because I am testing the scalability of my project, I increase the server numbers while keeping the client number at the same. In this process, I ocaasionally come across this error that interrupt my fabric task.
If I run task again, then this error will not happen. I read the question No handlers could be found for logger “paramiko.transport” , but that does not really explain why this error terminate my task and why the error happened occasionally.
I also checked the context where the error occurs, but the last executed commands are not even the same
Could someone just provide some debugging tricks to identify where the problem is.