2

Suppose I have a MPI server and two client - A and B, and both of them are connected to the same MPI server at the same time.

At this site, it state that "If A is connected to B and B to C, then A is connected to C." and "an error in one may affect the other"

Does it means that if B crashes, A's mpi calls to the server will be affected too?

If so, is there any solution to seperate them so they dont affect one another?

ogxing
  • 45
  • 5
  • No. MPI currently has no real fault tolerance; it's designed for (comparatively) short-running, peer-to-peer, closely coupled computation. For long running client-server work, it's not really a good choice. – Jonathan Dursi May 27 '15 at 12:48

2 Answers2

1

I have a somewhat more positive view of MPI fault tolerance than Jonathan Dursi does, but only slightly.

You can instruct MPI to report errors. It's not enitrely clear what you would do with that information, but in some cases it might be possible to retry or take an alternate approach.

This paper gets cited a ton and talks about the subset of MPI one might be able to use portably and still maintain fault tolerance: http://www.mcs.anl.gov/~lusk/papers/fault-tolerance.pdf

Sorry for sending a slide deck instead of actual content, but Wes Bland did a lot of work on this topic (and I'm sure he'll provide a better answer in a few minutes) http://www.mcs.anl.gov/~wbland/slides/jlpc13.pdf

Rob Latham
  • 5,085
  • 3
  • 27
  • 44
1

As summoned by Rob Latham...

MPI won't guarantee that you can still communicate with other processes after a failure, but there has been non-standard work to try to enable that usage model.

User Level Failure Mitigation is one way that lets you detect failures and continue executing. The site linked has some examples and use cases along with the full spec for ULFM. You might not need everything that it provides if all you want is to detect failures and continue. You can download the branch of Open MPI at that website or you can use the released versions of MPICH. For either one, use the MPIX_ prefix for the new functions.

All that being said, as Jonathan Dursi mentioned in the comment above, MPI may not be right for you if you're looking for a client/server model. Yes, it's possible, but it's not really optimized for that use case and you might have better luck using a different communication mechanism.

Wesley Bland
  • 8,816
  • 3
  • 44
  • 59