2

I am trying to understand how a Coarray Fortran DLL can be possibly called from Python. Consider the following sample Fortran module file example_mod.f90 which is to be called from Python later:

module example_mod
    use iso_c_binding
    implicit none
#ifdef COARRAY_ENABLED
    integer :: co_int[*]
#endif
    interface
    module subroutine sqr_2d_arr(nd, val, comm) BIND(C, NAME='sqr_2d_arr')
        !DEC$ ATTRIBUTES DLLEXPORT :: sqr_2d_arr
        integer, intent(in)     :: nd
        integer, intent(inout)  :: val(nd, nd), comm
    end subroutine sqr_2d_arr
    end interface
contains
end module example_mod

with the subroutine's implementation given in the submodule file example_mod@sub_smod.f90 :

submodule (example_mod) sub_smod
    implicit none
contains
    module procedure sqr_2d_arr

        use mpi
        integer :: rank, size, ierr

        integer :: i, j

        call MPI_Comm_size(comm, size, ierr)
        call MPI_Comm_rank(comm, rank, ierr)
        write(*,"(*(g0,:,' '))") "Hello from Fortran MPI! I am process", rank, "of", size, ', comm:', comm

        write(*,"(*(g0,:,' '))") "Hello from Fortran COARRAY! I am image ", this_image(), " out of", num_images(), "images."
        sync all

        do j = 1, nd
            do i = 1, nd
                val(i, j) = (val(i, j) + val(j, i)) ** 2
            enddo
        enddo

    end procedure sqr_2d_arr
end submodule sub_smod

The subroutine also contains calls to MPI library for the sake of comparison with Coarray. I compile this code with the following ifort flags:

mpiifort /Qcoarray=distributed /Od /debug:full /fpp -c example_mod.f90
mpiifort /Qcoarray=distributed /Od /debug:full /fpp -c example_mod@sub_smod.f90
mpiifort /Qcoarray=distributed /Od /debug:full /fpp /dll /libs:dll /threads example_mod.obj example_mod@sub_smod.obj

Now, I have the following Python2 script which calls the generated DLL above:

#!/usr/bin/env python

from __future__ import print_function
from mpi4py import MPI


comm = MPI.COMM_WORLD
fcomm = MPI.COMM_WORLD.py2f()
print("Hello from Python! I'm rank %d from %d running in total..." % (comm.rank, comm.size))

comm.Barrier()   # wait for everybody to synchronize _here_

######################

import ctypes as ct
import numpy as np

# import the dll
fortlib = ct.CDLL('example_mod.dll')

# setup the data
N = 2
nd = ct.pointer( ct.c_int(N) )          # setup the pointer
pyarr = np.arange(0, N, dtype=int) * 5  # setup the N-long
for i in range(1, N):                   # concatenate columns until it is N x N
    pyarr = np.c_[pyarr, np.arange(0, N, dtype=int) * 5]

# call the function by passing the ctypes pointer using the numpy function:
fcomm_pt = ct.pointer( ct.c_int(fcomm) )
_ = fortlib.sqr_2d_arr(nd, np.ctypeslib.as_ctypes(pyarr),fcomm_pt)

print(pyarr)

Running this script with the following command:

mpiexec -np 4 python main.py

yields this output:

Hello from Fortran MPI! I am process 1 of 4 , comm: 1140850688
Hello from Fortran MPI! I am process 3 of 4 , comm: 1140850688
Hello from Fortran COARRAY! I am image  1  out of 0 images.
Hello from Fortran MPI! I am process 0 of 4 , comm: 1140850688
Hello from Fortran COARRAY! I am image  1  out of 0 images.
Hello from Fortran MPI! I am process 2 of 4 , comm: 1140850688
Hello from Fortran COARRAY! I am image  1  out of 0 images.
Hello from Fortran COARRAY! I am image  1  out of 0 images.
Hello from Python! I'm rank 3 from 4 running in total...
[[  0  25]
 [900 100]]
Hello from Python! I'm rank 0 from 4 running in total...
[[  0  25]
 [900 100]]
Hello from Python! I'm rank 1 from 4 running in total...
[[  0  25]
 [900 100]]
Hello from Python! I'm rank 2 from 4 running in total...
[[  0  25]
 [900 100]]

The computations performed in this set of codes is not important or relevant to the discussion here. However, I cannot understand why the MPI ranks are properly output, while the Coarray num_images() is zero for all processes. As a broader question, what is the best strategy to write a Coarray Fortran application that can be called from other languages such as Python?

Scientist
  • 1,767
  • 2
  • 12
  • 20
  • I understand that you have incorrect result when using multiprocessing. I heard that sharing a DLL while using python multiprocessing doesn't work very well. A workaround is to create a copy of the DLL for each process. – Jean-François Fabre Feb 16 '19 at 08:45

0 Answers0