I'm running some distributed training on some platform using MPI. During the training I saw massive printings like:
Read -1, expected 5017600, errno = 1
Read -1, expected 5017600, errno = 1
Read -1, expected 5017600, errno = 1
Read -1, expected 5017600, errno = 1
Read -1, expected 5017600, errno = 1
...
After some investigation I found that it is caused by the default docker Seccomp. If I run the docker with --cap-add=SYS_PTRACE
those massive printing will go away.
However, I can not add flag for docker run
since I can't control the launching of docker images: they are launched by the platform. So, is there a way to modify ptrace
setting in either Dockerfile or inside the docker container?
Another finding is that running MPI with btl_vader_single_copy_mechanism none
will disable these prints but the performance will be harmed, so that is not an option.
Any help will be very appreciated!