1

This is my first ever post here, so please excuse any formatting issues.

I have an interactive program which spawns external processes and monitors their IO. Things work fine until I spawn something off with "mpiexec", after which STDIN appears to break.

I realize this will be difficult to reproduce for most folks, but if anyone sees anything obvious or knows of this problem.... please help!

Here's a snippet:

int main( ... )
{
  std::string choice;
  while(std::getline(std::cin,choice)){
     if(!choice.empty()){
       if(choice == "Parallel"){
        system("mpiexec ./aprogram");
       }
       if(choice == "Serial"){
        system("./aprogram");
       }
       // Now the external process is done... so far, so good
       std::cout << "Program is done. Press ENTER to continue." 
                 << std::endl;
       // This next line *works* if the external process was serial
       // But *fails* when "mpiexec" was invoked 
       std::getline(std::cin,choice);
       if(std::cin.eof()){
         std::cout << "STDIN has been closed." << std::endl;
         exit(1);
       }
     }
  }
} 

I have tried lots of various things, e.g. pipes, explicit forking, meticulous descriptor management. The weirdest thing is that if I dup off and save stdin and then restore it after "mpiexec" returns, then I no longer get EOF on std::cin, but instead, std::getline(std::cin,...) no longer blocks! The program goes into an infinite loop reading zero bytes off std::cin in the std::readline call.

If, while the external process is running under mpiexec, I stack a bunch of data into std::cin (e.g. by typing), then subsequent calls to std::readline correctly parse the lines of data that I have stuck in there, but again... once it is done reading through that data, it just keeps going in an infinite loop (i.e. not blocking on std::readline(std::cin,..) even if there is no data to read! Ugh. So annoying.

Any help is deeply appreciated.

Cheers!

  • Which MPI implementation are you using, on which platform? – Dave Goodell Jan 07 '13 at 17:00
  • We're currently using MPICH-based MPI implementations (several flavors including MVAPICH, vanilla MPICH2, and MPICH-MX) on several different Linux-based platforms (the majority of which are S.L. or CentOS-based). I've not tried with OpenMPI, but some of the apps in this integrated system do not play nice with OpenMPI. I believe the problem must be centered around something that "mpiexec" is doing with STDIN. I will try to get a better snippet so that it can be easily reproduced with vanilla mpich. – user1950175 Jan 07 '13 at 20:50
  • I can reproduce your problem with your example code and MPICH-3.0.1. I'm looking into it. – Dave Goodell Jan 08 '13 at 17:22
  • Outstanding. Thanks a lot, Dave. – user1950175 Jan 10 '13 at 01:02
  • No fix yet, but it doesn't appear to actually be closing the underlying file descriptor. Instead, the `read` call on fd `0` is returning `EAGAIN`, as though stdin has been made nonblocking. I haven't yet found anywhere we are doing this, but there is definitely something funny happening here. We'll keep looking at it. – Dave Goodell Jan 10 '13 at 20:52
  • Thanks, Dave. I also tried to set the descriptor back to blocking, to no avail. I'll get back to looking at it again too ... after getting this other junk off my plate. – user1950175 Jan 11 '13 at 00:33
  • I wasn't able to figure out what the problem is in any reasonable amount of time. I've filed a ticket in our local bug tracker so that we don't forget about the issue: http://trac.mpich.org/projects/mpich/ticket/1782 – Dave Goodell Jan 16 '13 at 19:41

1 Answers1

1

I think I fixed your problem, for me either a call to Serial or Parallel blocked, and I think it was the std::cin.eof() test,

  std::getline(std::cin,choice);
  if(std::cin.eof()){         
     td::cout << "STDIN has been closed." << std::endl;
     exit(1);
   }

However, changing this to std::cin.get(), works great for both the parallel run and serial run.

   if(std::cin.get()) {
     std::cout << "STDIN has been closed." << std::endl;
     exit(1);
   }

Works on my system.