I am using Amazon EC2 nodes and running an MPI parallel program in C. I am using starcluster to manage the instances. The program compiles fine using mpicc
. The executable is then on a mounted space shared by all nodes. However, when I run the executable using mpirun
, sometimes old versions of the executable load instead.
For example, if I have a master and 9 nodes, and print "Version 1.0", I'll get 10 string outputs of "Version 1.0". If I update the code to print "Version 1.1", and compile on the master, then run instantly, I'll get one line of "Version 1.1" and 9 lines of "Version 1.0"... unless I wait another minute or two to run, then I get all ten lines of "Version 1.1".
Why is there such a delay for the other nodes to update their executable? Is it an issue with MPIcc? The way I am mounting the shared space?