0

I am running analysis on a cluster and internally I am spawning some processes. Most of the times it works, but sometimes I get following error:

mm_xpmem.c:135  UCX  ERROR   failed to attach xpmem apid 0x600005c0e offset 0x2b8cb9183000 length 12288: No such file or directory
mm_ep.c:172  UCX  ERROR   mm ep failed to connect to remote FIFO id 0x2b8cb9183000: Input/output error

This error is raised randomly. What is the cause for this error and how can this be resolved?

OpenMPI: 4.0.5
mpi4py: 3.1.3

Pavan
  • 133
  • 7

1 Answers1

0

I don't know if this is possible in your case, but removing the xpmem kernel module (done by administrator) fixed a similar problem I had with openMPI 4.1.1.1.