I am trying to parallelize a small part of my python code in Fortran90. So, as a start, I am trying to understand how the spawning function works.
Firstly, I tried to spawn a child process in python from a python parent. I used the example for dynamic process management from the mpi4py tutorial. Everything worked fine. In this case, from what I understand, only the inter-communicator between the parent process and the child process is used.
Then, I moved on to an example for spawning a child process in fortran90 from a python parent. For this, I used an example from one of the previous post in stackoverflow.The python code (master.py) that spawns the fortran child is as follows:
from mpi4py import MPI
import numpy
'''
slavef90 is an executable built starting from slave.f90
'''
# Spawing a process running an executable
# sub_comm is an MPI intercommunicator
sub_comm = MPI.COMM_SELF.Spawn('slavef90', args=[], maxprocs=1)
# common_comm is an intracommunicator accross the python process and the spawned process.
# All kind sof collective communication (Bcast...) are now possible between the python process and the c process
common_comm=sub_comm.Merge(False)
print('parent in common_comm ', common_comm.Get_rank(), ' of ', common_comm.Get_size())
data = numpy.arange(1, dtype='int32')
data[0]=42
print("Python sending message to fortran: {}".format(data))
common_comm.Send([data, MPI.INT], dest=1, tag=0)
print("Python over")
# disconnecting the shared communicators is required to finalize the spawned process.
sub_comm.Disconnect()
common_comm.Disconnect()
The corresponding fortran90 code (slave.f90) where the child processes get spawned is as follows:
program test
!
implicit none
!
include 'mpif.h'
!
integer :: ierr,s(1),stat(MPI_STATUS_SIZE)
integer :: parentcomm,intracomm
!
call MPI_INIT(ierr)
call MPI_COMM_GET_PARENT(parentcomm, ierr)
call MPI_INTERCOMM_MERGE(parentcomm, 1, intracomm, ierr)
call MPI_RECV(s, 1, MPI_INTEGER, 0, 0, intracomm,stat, ierr)
print*, 'fortran program received: ', s
call MPI_COMM_DISCONNECT(intracomm, ierr)
call MPI_COMM_DISCONNECT(parentcomm, ierr)
call MPI_FINALIZE(ierr)
endprogram test
I compiled the fortran90 code using mpif90 slave.f90 -o slavef90 -Wall
. I ran the python code normally using python master.py
. I am able to get the desired output, but, the spawned processes won't disconnect, i.e., any statements after the Disconnect commands (call MPI_COMM_DISCONNECT(intracomm, ierr)
and call MPI_COMM_DISCONNECT(parentcomm, ierr)
) wont be executed in the fortran code (and hence any statements after the Disconnect commands in the python code is also not executed) and my code wont terminate in the terminal.
In this case, to my understanding, the inter-communicator and the intra-communicator are merged so that the child processes and parent processes are not two different groups anymore. And, there seems to be some problem when disconnecting them. But, I am not able to figure out a solution. I tried reproducing the fortran90 code where the child processes are spawned in C++ and in python as well and faced the same problem. Any help is appreciated. Thanks.